遍历整个 PDF 并仅删除超链接(注释)的下划线 + iText
Traverse whole PDF and Remove underlines of hyperlinks (annotations) only + iText
我已经使用以下 link 代码成功更改了下划线的颜色。任何人都可以帮助我如何从 PDF 中删除下划线,我使用以下 link 代码找到的下划线。
下面是我的代码,用于查找 hyperlinks 并将其颜色更改为黑色。我必须修改此代码以删除那些下划线。
PdfCanvasEditor editor = new PdfCanvasEditor() {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
已编辑:
已接受的答案不适用于以下文件:
请帮忙。
如所引用问题上下文中的评论所述
it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too.
这是以下内容编辑器所做的:
class PdfGraphicsRemoverByColor extends PdfCanvasEditor {
public PdfGraphicsRemoverByColor(Color color) {
this.color = color;
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (color.equals(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (color.equals(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Color color;
}
(RemoveGraphicsByColor帮手class)
这样应用:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColor(ColorConstants.BLUE);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
参考问题中的示例文件 Control_of_nitrosamine_impurities_in_sartans__rev.pdf
、EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf
和 originalFile.pdf
,得到:
和
和
请注意,这只是概念验证,并非最终的完整解决方案。特别是:
只考虑RGB蓝色。这可能是一个问题,特别是在文档明确设计用于打印(可能使用 CMYK 颜色)的情况下。
所有路径填充和描边只要是蓝色就会被丢弃。根据您的文件,这可能需要过滤。
PdfCanvasEditor
仅检查和编辑页面本身的内容流,而不是显示的 XObject 或模式的内容流;因此,可能找不到某些内容。它可以很容易地推广。
与其他 RGB 颜色空间不同的蓝色阴影
测试上面的代码,您发现文档中的蓝线没有被删除。事实证明,这些蓝色不是来自 DeviceRGB 标准 RGB,而是来自 ICCBased 色彩空间,更准确地说,是对 RGB 色彩空间的分析。此外,在一份文件中,使用的不是纯蓝色 0 0 1
而是 .17255 .3098 .63529
蓝色。
为了也能够处理这些文档,必须推广上述方法;例如我们可以使用 Predicate<Color>
而不是单个特定的 Color
,例如像这样:
class PdfGraphicsRemoverByColorPredicate extends PdfCanvasEditor {
public PdfGraphicsRemoverByColorPredicate(Predicate<Color> colorPredicate) {
this.colorPredicate = colorPredicate;
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (colorPredicate.test(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (colorPredicate.test(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Predicate<Color> colorPredicate;
}
(RemoveGraphicsByColor帮手class)
这样应用:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColorPredicate(RemoveGraphicsByColor::isRgbBlue);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(RemoveGraphicsByColor testRemoveAllBlueLinesFrom*
测试)
到使用此谓词方法的新示例文件
public static boolean isRgbBlue(Color color) {
if (color instanceof CalRgb || color instanceof DeviceRgb || (color instanceof IccBased && color.getNumberOfComponents() == 3)) {
float[] components = color.getColorValue();
float r = components[0];
float g = components[1];
float b = components[2];
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
(RemoveGraphicsByColor辅助方法)
一个得到
和
注意,上面的警告仍然适用。
我已经使用以下 link 代码成功更改了下划线的颜色。任何人都可以帮助我如何从 PDF 中删除下划线,我使用以下 link 代码找到的下划线。
下面是我的代码,用于查找 hyperlinks 并将其颜色更改为黑色。我必须修改此代码以删除那些下划线。
PdfCanvasEditor editor = new PdfCanvasEditor() {
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (SET_FILL_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("g"), Arrays.asList(new PdfNumber(0), new PdfLiteral("g")));
return;
}
}
if (SET_STROKE_RGB.equals(operatorString) && operands.size() == 4) {
if (isApproximatelyEqual(operands.get(0), 0) &&
isApproximatelyEqual(operands.get(1), 0) &&
isApproximatelyEqual(operands.get(2), 1)) {
super.write(processor, new PdfLiteral("G"), Arrays.asList(new PdfNumber(0), new PdfLiteral("G")));
return;
}
}
super.write(processor, operator, operands);
}
boolean isApproximatelyEqual(PdfObject number, float reference) {
return number instanceof PdfNumber && Math.abs(reference - ((PdfNumber)number).floatValue()) < 0.01f;
}
final String SET_FILL_RGB = "rg";
final String SET_STROKE_RGB = "RG";
};
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++) {
editor.editPage(pdfDocument, i);
}
已编辑:
已接受的答案不适用于以下文件:
请帮忙。
如所引用问题上下文中的评论所述
it is easy to make the editor class above remove vector graphics by replacing fill or stroke instructions by instructions dropping the current path without drawing it. If only doing so in case of the applicable current color being blue, that would likely do the job in case of your example PDFs. But beware, in documents with other graphics with blue elements (e.g. logos), these would be mutilated, too.
这是以下内容编辑器所做的:
class PdfGraphicsRemoverByColor extends PdfCanvasEditor {
public PdfGraphicsRemoverByColor(Color color) {
this.color = color;
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (color.equals(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (color.equals(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Color color;
}
(RemoveGraphicsByColor帮手class)
这样应用:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColor(ColorConstants.BLUE);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
参考问题中的示例文件 Control_of_nitrosamine_impurities_in_sartans__rev.pdf
、EDQM_reports_issues_of_non-compliance_with_tooth__Mac.pdf
和 originalFile.pdf
,得到:
和
和
请注意,这只是概念验证,并非最终的完整解决方案。特别是:
只考虑RGB蓝色。这可能是一个问题,特别是在文档明确设计用于打印(可能使用 CMYK 颜色)的情况下。
所有路径填充和描边只要是蓝色就会被丢弃。根据您的文件,这可能需要过滤。
PdfCanvasEditor
仅检查和编辑页面本身的内容流,而不是显示的 XObject 或模式的内容流;因此,可能找不到某些内容。它可以很容易地推广。
与其他 RGB 颜色空间不同的蓝色阴影
测试上面的代码,您发现文档中的蓝线没有被删除。事实证明,这些蓝色不是来自 DeviceRGB 标准 RGB,而是来自 ICCBased 色彩空间,更准确地说,是对 RGB 色彩空间的分析。此外,在一份文件中,使用的不是纯蓝色 0 0 1
而是 .17255 .3098 .63529
蓝色。
为了也能够处理这些文档,必须推广上述方法;例如我们可以使用 Predicate<Color>
而不是单个特定的 Color
,例如像这样:
class PdfGraphicsRemoverByColorPredicate extends PdfCanvasEditor {
public PdfGraphicsRemoverByColorPredicate(Predicate<Color> colorPredicate) {
this.colorPredicate = colorPredicate;
}
@Override
protected void write(PdfCanvasProcessor processor, PdfLiteral operator, List<PdfObject> operands)
{
String operatorString = operator.toString();
if (colorPredicate.test(getGraphicsState().getFillColor())) {
switch (operatorString) {
case "f":
case "f*":
case "F":
operatorString = "n";
break;
case "b":
case "b*":
operatorString = "s";
break;
case "B":
case "B*":
operatorString = "S";
break;
}
}
if (colorPredicate.test(getGraphicsState().getStrokeColor())) {
switch (operatorString) {
case "s":
case "S":
operatorString = "n";
break;
case "b":
case "B":
operatorString = "f";
break;
case "b*":
case "B*":
operatorString = "f*";
break;
}
}
operator = new PdfLiteral(operatorString);
operands.set(operands.size() - 1, operator);
super.write(processor, operator, operands);
}
final Predicate<Color> colorPredicate;
}
(RemoveGraphicsByColor帮手class)
这样应用:
try ( PdfReader pdfReader = new PdfReader(INPUT);
PdfWriter pdfWriter = new PdfWriter(OUTPUT);
PdfDocument pdfDocument = new PdfDocument(pdfReader, pdfWriter) )
{
PdfCanvasEditor editor = new PdfGraphicsRemoverByColorPredicate(RemoveGraphicsByColor::isRgbBlue);
for (int i = 1; i <= pdfDocument.getNumberOfPages(); i++)
{
editor.editPage(pdfDocument, i);
}
}
(RemoveGraphicsByColor testRemoveAllBlueLinesFrom*
测试)
到使用此谓词方法的新示例文件
public static boolean isRgbBlue(Color color) {
if (color instanceof CalRgb || color instanceof DeviceRgb || (color instanceof IccBased && color.getNumberOfComponents() == 3)) {
float[] components = color.getColorValue();
float r = components[0];
float g = components[1];
float b = components[2];
return b > .5f && r < .9f*b && g < .9f*b;
}
return false;
}
(RemoveGraphicsByColor辅助方法)
一个得到
和
注意,上面的警告仍然适用。