遍历整个 PDF 并将蓝色更改为黑色(也更改下划线的颜色)+ iText
Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText
我正在使用以下代码从 pdf 文本中删除蓝色。它工作正常。但它不是改变下划线颜色,而是正确改变文本颜色。
原始文件部分:
被操纵的文件:
正如您在上面的操作文件中看到的,下划线颜色没有改变。
两周以来我一直在寻找解决办法,任何人都可以帮忙解决这个问题。下面是我的更改颜色代码:
public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
try (InputStream resource = getClass().getResourceAsStream(source);
PdfReader pdfReader = new PdfReader(source);
OutputStream result = new FileOutputStream(filename);
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
PdfCanvasEditor editor = new PdfCanvasEditor() {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
if (currentlyReplacedBlack == null) {
Color currentFillColor =getGraphicsState().getFillColor();
if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
currentlyReplacedBlack = currentFillColor;
super.write(processor, new PdfLiteral("rg"), listobj);
}
}
} else if (currentlyReplacedBlack != null) {
if (currentlyReplacedBlack instanceof DeviceCmyk) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("k"));
super.write(processor, new PdfLiteral("k"), listobj);
} else if (currentlyReplacedBlack instanceof DeviceGray) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("g"));
super.write(processor, new PdfLiteral("g"), listobj);
} else {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
super.write(processor, new PdfLiteral("rg"), listobj);
}
currentlyReplacedBlack = null;
}
super.write(processor, operator, operands);
}
Color currentlyReplacedBlack = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
}
File file = new File(source);
file.delete();
}
这是原始文件。
https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf
相关链接:
Traverse whole PDF and change some attribute with some object in it using iText
Removing Watermark from PDF iTextSharp
Maven 依赖项详细信息:
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.5</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.0.6</version>
</dependency>
已编辑:
已接受的答案不适用于以下文件:
请帮忙。
(此处的示例代码使用 iText 7 for Java。您在标签或问题文本中既未提及 iText 版本也未提及您的编程环境,但您的示例代码似乎表明这是您选择的组合。)
替换蓝色填充颜色
您基于原始代码的测试明确仅尝试更改 text 颜色。但是,文档中的“下划线”(就 PDF 绘图而言)不是文本的一部分,而是绘制为一条简单的路径。因此,原始代码明确未触及下划线,必须根据您的任务进行调整。
但实际上,将 所有内容 蓝色更改为黑色,比仅更改蓝色文本更容易实现,例如
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor 测试 testChangeFillRgbBlueToBlack
)
请注意,这只是概念验证,并非最终的完整解决方案。特别是:
- 它只查看填充(非描边)颜色。在你的情况下,你的文本(像往常一样)和你的下划线都只使用填充颜色就足够了 - 下划线实际上不是绘制为 描边线 而是作为 纤细的填充矩形.
- 只有 RGB 蓝色(并且只有这样的蓝色使用 rg 指令设置,而不是使用 sc 或 scn 设置,更不用说使用时髦的混合模式从其他颜色中组合出的蓝色)被考虑在内。这可能是一个问题,特别是在文档明确设计用于打印(可能使用 CMYK 颜色)的情况下。
PdfCanvasEditor
仅检查和编辑页面本身的内容流,而不是显示的 XObject 或模式的内容流;因此,可能找不到某些内容。它可以很容易地推广。
结果:
替换蓝色填充和描边颜色
测试上面的代码,您很快发现文档中的下划线没有改变。事实证明,这些下划线实际上是绘制为描边线,而不是像上面那样填充的矩形。
因此,要正确编辑此类文档,您不仅必须编辑填充颜色,还必须编辑描边颜色,例如像这样:
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor 测试 testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev
和 testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac
)
结果:
和
从其他 RGB 颜色空间替换不同的蓝色阴影
再次测试上面的代码,您发现文档中的蓝色没有改变。事实证明,这些蓝色不是来自 DeviceRGB 标准 RGB,而是来自 ICCBased 色彩空间,更准确地说,是对 RGB 色彩空间的分析。特别是使用了比以前更多的其他颜色设置运算符,sc / scn 而不是 rg。此外,在一份文件中,使用的不是纯蓝色 0 0 1
而是 .17255 .3098 .63529
蓝色
如果我们假设 sc 和 scn 带有三个数字参数的指令像这里一样设置了一些 RGB 颜色的味道(通常这是过于简单化,Lab 和其他颜色空间也可以有 4 个分量,但你的文档似乎是 RGB 导向的)并且在识别蓝色方面不太严格,我们可以将上面的代码概括如下:
class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
PdfNumber number0 = new PdfNumber(0);
operands.set(0, number0);
operands.set(1, number0);
operands.set(2, number0);
}
}
super.write(processor, operator, operands);
}
boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
float r = ((PdfNumber)red).floatValue();
float g = ((PdfNumber)green).floatValue();
float b = ((PdfNumber)blue).floatValue();
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}
(ChangeColor 助手 class)
像这样使用
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
我们得到
和
我正在使用以下代码从 pdf 文本中删除蓝色。它工作正常。但它不是改变下划线颜色,而是正确改变文本颜色。
原始文件部分:
被操纵的文件:
正如您在上面的操作文件中看到的,下划线颜色没有改变。
两周以来我一直在寻找解决办法,任何人都可以帮忙解决这个问题。下面是我的更改颜色代码:
public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
try (InputStream resource = getClass().getResourceAsStream(source);
PdfReader pdfReader = new PdfReader(source);
OutputStream result = new FileOutputStream(filename);
PdfWriter pdfWriter = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
PdfCanvasEditor editor = new PdfCanvasEditor() {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {
String operatorString = operator.toString();
if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
if (currentlyReplacedBlack == null) {
Color currentFillColor =getGraphicsState().getFillColor();
if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
currentlyReplacedBlack = currentFillColor;
super.write(processor, new PdfLiteral("rg"), listobj);
}
}
} else if (currentlyReplacedBlack != null) {
if (currentlyReplacedBlack instanceof DeviceCmyk) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("k"));
super.write(processor, new PdfLiteral("k"), listobj);
} else if (currentlyReplacedBlack instanceof DeviceGray) {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("g"));
super.write(processor, new PdfLiteral("g"), listobj);
} else {
List<PdfObject> listobj = new ArrayList<>();
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfNumber(0));
listobj.add(new PdfLiteral("rg"));
super.write(processor, new PdfLiteral("rg"), listobj);
}
currentlyReplacedBlack = null;
}
super.write(processor, operator, operands);
}
Color currentlyReplacedBlack = null;
final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
}
File file = new File(source);
file.delete();
}
这是原始文件。 https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf
相关链接:
Traverse whole PDF and change some attribute with some object in it using iText
Removing Watermark from PDF iTextSharp
Maven 依赖项详细信息:
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.1.5</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.0.6</version>
</dependency>
已编辑:
已接受的答案不适用于以下文件:
请帮忙。
(此处的示例代码使用 iText 7 for Java。您在标签或问题文本中既未提及 iText 版本也未提及您的编程环境,但您的示例代码似乎表明这是您选择的组合。)
替换蓝色填充颜色
您基于原始代码的测试明确仅尝试更改 text 颜色。但是,文档中的“下划线”(就 PDF 绘图而言)不是文本的一部分,而是绘制为一条简单的路径。因此,原始代码明确未触及下划线,必须根据您的任务进行调整。
但实际上,将 所有内容 蓝色更改为黑色,比仅更改蓝色文本更容易实现,例如
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor 测试 testChangeFillRgbBlueToBlack
)
请注意,这只是概念验证,并非最终的完整解决方案。特别是:
- 它只查看填充(非描边)颜色。在你的情况下,你的文本(像往常一样)和你的下划线都只使用填充颜色就足够了 - 下划线实际上不是绘制为 描边线 而是作为 纤细的填充矩形.
- 只有 RGB 蓝色(并且只有这样的蓝色使用 rg 指令设置,而不是使用 sc 或 scn 设置,更不用说使用时髦的混合模式从其他颜色中组合出的蓝色)被考虑在内。这可能是一个问题,特别是在文档明确设计用于打印(可能使用 CMYK 颜色)的情况下。
PdfCanvasEditor
仅检查和编辑页面本身的内容流,而不是显示的 XObject 或模式的内容流;因此,可能找不到某些内容。它可以很容易地推广。
结果:
替换蓝色填充和描边颜色
测试上面的代码,您很快发现文档中的下划线没有改变。事实证明,这些下划线实际上是绘制为描边线,而不是像上面那样填充的矩形。
因此,要正确编辑此类文档,您不仅必须编辑填充颜色,还必须编辑描边颜色,例如像这样:
try ( PdfReader pdfReader = new PdfReader(SOURCE_PDF);
PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfCanvasEditor()
{
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(ChangeColor 测试 testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRev
和 testChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac
)
结果:
和
从其他 RGB 颜色空间替换不同的蓝色阴影
再次测试上面的代码,您发现文档中的蓝色没有改变。事实证明,这些蓝色不是来自 DeviceRGB 标准 RGB,而是来自 ICCBased 色彩空间,更准确地说,是对 RGB 色彩空间的分析。特别是使用了比以前更多的其他颜色设置运算符,sc / scn 而不是 rg。此外,在一份文件中,使用的不是纯蓝色 0 0 1
而是 .17255 .3098 .63529
蓝色
如果我们假设 sc 和 scn 带有三个数字参数的指令像这里一样设置了一些 RGB 颜色的味道(通常这是过于简单化,Lab 和其他颜色空间也可以有 4 个分量,但你的文档似乎是 RGB 导向的)并且在识别蓝色方面不太严格,我们可以将上面的代码概括如下:
class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
PdfNumber number0 = new PdfNumber(0);
operands.set(0, number0);
operands.set(1, number0);
operands.set(2, number0);
}
}
super.write(processor, operator, operands);
}
boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
float r = ((PdfNumber)red).floatValue();
float g = ((PdfNumber)green).floatValue();
float b = ((PdfNumber)blue).floatValue();
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}
(ChangeColor 助手 class)
像这样使用
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
我们得到
和