遍历整个 PDF 并将蓝色更改为黑色(也更改下划线的颜色)+ iText

Traverse whole PDF and change blue color to black ( Change color of underlines as well) + iText

我正在使用以下代码从 pdf 文本中删除蓝色。它工作正常。但它不是改变下划线颜色,而是正确改变文本颜色。

原始文件部分:

被操纵的文件:

正如您在上面的操作文件中看到的,下划线颜色没有改变。

两周以来我一直在寻找解决办法,任何人都可以帮忙解决这个问题。下面是我的更改颜色代码:

public void testChangeBlackTextToGreenDocument(String source, String filename) throws IOException {
    try (InputStream resource = getClass().getResourceAsStream(source);
            PdfReader pdfReader = new PdfReader(source);
            OutputStream result = new FileOutputStream(filename);
            PdfWriter pdfWriter = new PdfWriter(result);
            PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter);) {
        PdfCanvasEditor editor = new PdfCanvasEditor() {

            @Override
            protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands) {
                
                String operatorString = operator.toString();

                if (TEXT_SHOWING_OPERATORS.contains(operatorString)) {
                    List<PdfObject> listobj = new ArrayList<>();
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfNumber(0));
                    listobj.add(new PdfLiteral("rg"));
                    if (currentlyReplacedBlack == null) {
                        Color currentFillColor =getGraphicsState().getFillColor();
                        if (ColorConstants.GREEN.equals(currentFillColor) || ColorConstants.CYAN.equals(currentFillColor) || ColorConstants.BLUE.equals(currentFillColor)) {
                            currentlyReplacedBlack = currentFillColor;
                            super.write(processor, new PdfLiteral("rg"), listobj);
                        }
                    }
                } else if (currentlyReplacedBlack != null) {
                    if (currentlyReplacedBlack instanceof DeviceCmyk) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("k"));
                        super.write(processor, new PdfLiteral("k"), listobj);
                    } else if (currentlyReplacedBlack instanceof DeviceGray) {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("g"));
                        super.write(processor, new PdfLiteral("g"), listobj);
                    } else {
                        List<PdfObject> listobj = new ArrayList<>();
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfNumber(0));
                        listobj.add(new PdfLiteral("rg"));
                        super.write(processor, new PdfLiteral("rg"), listobj);
                    }
                    currentlyReplacedBlack = null;
                }

                super.write(processor, operator, operands);
            }

            Color currentlyReplacedBlack = null;

            final List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");
        };
        for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
            editor.editPage(pdfDocument, i);
        }
    }
    File file = new File(source);
    file.delete();
}

这是原始文件。 https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/originalFile.pdf

相关链接:

Traverse whole PDF and change some attribute with some object in it using iText

Removing Watermark from PDF iTextSharp

Maven 依赖项详细信息:

        <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itext7-core</artifactId>
        <version>7.1.5</version>
        <type>pom</type>
    </dependency>
    
    <dependency>
        <groupId>com.itextpdf</groupId>
        <artifactId>itextpdf</artifactId>
        <version>5.0.6</version>
    </dependency>

已编辑:

已接受的答案不适用于以下文件:

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/021549Orig1s025_aprepitant_clinpharm_prea_Mac.pdf(第 41 页)

https://raad-dev-test.s3.ap-south-1.amazonaws.com/36/2019-08-30/400_206494S5_avibactam_and_ceftazidine_unireview_prea_Mac.pdf(第 60 页)。

请帮忙。

(此处的示例代码使用 iText 7 for Java。您在标签或问题文本中既未提及 iText 版本也未提及您的编程环境,但您的示例代码似乎表明这是您选择的组合。)

替换蓝色填充颜色

您基于原始代码的测试明确仅尝试更改 text 颜色。但是,文档中的“下划线”(就 PDF 绘图而言)不是文本的一部分,而是绘制为一条简单的路径。因此,原始代码明确未触及下划线,必须根据您的任务进行调整。

但实际上,将 所有内容 蓝色更改为黑色,比仅更改蓝色文本更容易实现,例如

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }
            
            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor 测试 testChangeFillRgbBlueToBlack)

请注意,这只是概念验证,并非最终的完整解决方案。特别是:

  • 它只查看填充(非描边)颜色。在你的情况下,你的文本(像往常一样)和你的下划线都只使用填充颜色就足够了 - 下划线实际上不是绘制为 描边线 而是作为 纤细的填充矩形.
  • 只有 RGB 蓝色(并且只有这样的蓝色使用 rg 指令设置,而不是使用 scscn 设置,更不用说使用时髦的混合模式从其他颜色中组合出的蓝色)被考虑在内。这可能是一个问题,特别是在文档明确设计用于打印(可能使用 CMYK 颜色)的情况下。
  • PdfCanvasEditor 仅检查和编辑页面本身的内容流,而不是显示的 XObject 或模式的内容流;因此,可能找不到某些内容。它可以很容易地推广。

结果:

替换蓝色填充和描边颜色

测试上面的代码,您很快发现文档中的下划线没有改变。事实证明,这些下划线实际上是绘制为描边线,而不是像上面那样填充的矩形。

因此,要正确编辑此类文档,您不仅必须编辑填充颜色,还必须编辑描边颜色,例如像这样:

try (   PdfReader pdfReader = new PdfReader(SOURCE_PDF);
        PdfWriter pdfWriter = new PdfWriter(RESULT_PDF);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
    PdfCanvasEditor editor = new PdfCanvasEditor()
    {
        @Override
        protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
        {
            String operatorString = operator.toString();

            if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
                    return;
                }
            }

            if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
                if (isApproximatelyEqual(operands.get(0), 0) &&
                        isApproximatelyEqual(operands.get(1), 0) &&
                        isApproximatelyEqual(operands.get(2), 1)) {
                    super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
                    return;
                }
            }

            super.write(processor, operator, operands);
        }

        boolean isApproximatelyEqual(PdfObject number, float reference) {
            return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
        }

        final String SET_FILL_RGB = "rg";
        final String SET_STROKE_RGB = "RG";
    };
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

(ChangeColor 测试 testChangeRgbBlueToBlackControlOfNitrosamineImpuritiesInSartansRevtestChangeRgbBlueToBlackEdqmReportsIssuesOfNonComplianceWithToothMac)

结果:

从其他 RGB 颜色空间替换不同的蓝色阴影

再次测试上面的代码,您发现文档中的蓝色没有改变。事实证明,这些蓝色不是来自 DeviceRGB 标准 RGB,而是来自 ICCBased 色彩空间,更准确地说,是对 RGB 色彩空间的分析。特别是使用了比以前更多的其他颜色设置运算符,sc / scn 而不是 rg。此外,在一份文件中,使用的不是纯蓝色 0 0 1 而是 .17255 .3098 .63529 蓝色

如果我们假设 scscn 带有三个数字参数的指令像这里一样设置了一些 RGB 颜色的味道(通常这是过于简单化,Lab 和其他颜色空间也可以有 4 个分量,但你的文档似乎是 RGB 导向的)并且在识别蓝色方面不太严格,我们可以将上面的代码概括如下:

class AllRgbBlueToBlackConverter extends PdfCanvasEditor {
    @Override
    protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
    {
        String operatorString = operator.toString();

        if (RGB_SETTER_CANDIDATES.contains(operatorString) && operands.size() == 4) {
            if (isBlue(operands.get(0), operands.get(1), operands.get(2))) {
                PdfNumber number0 = new PdfNumber(0);
                operands.set(0, number0);
                operands.set(1, number0);
                operands.set(2, number0);
            }
        }

        super.write(processor, operator, operands);
    }

    boolean isBlue(PdfObject red, PdfObject green, PdfObject blue) {
        if (red instanceof PdfNumber && green instanceof PdfNumber && blue instanceof PdfNumber) {
            float r = ((PdfNumber)red).floatValue();
            float g = ((PdfNumber)green).floatValue();
            float b = ((PdfNumber)blue).floatValue();
            return b > .5f && r < .9f*b && g < .9f*b;
        }
        return false;
    }

    final Set<String> RGB_SETTER_CANDIDATES = new HashSet<>(Arrays.asList("rg", "RG", "sc", "SC", "scn", "SCN"));
}

(ChangeColor 助手 class)

像这样使用

try (   PdfReader pdfReader = new PdfReader(INPUT);
        PdfWriter pdfWriter = new PdfWriter(OUTPUT);
        PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) ) {
    PdfCanvasEditor editor = new AllRgbBlueToBlackConverter();
    for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
    {
        editor.editPage(pdfDocument, i);
    }
}

我们得到