使用 iTextSharp 删除 PDF 中的对象并保存

Question

这是一个 OCR 出错的案例。我需要从 PDF 中删除隐藏的文本，但我很难弄清楚如何去做。

隐藏文本位于始终名为 /QuickPDFsomething 的区域中，该区域位于页面的 /Resources 字典中的 /XObject 字典中。

我已经尝试了这两种方法，但都没有用，所以我显然做错了什么。

选项 1 - 终止对象 - PDF 无法在 Acrobat 中打开并声明，'An error exists on this page. Acrobat may not display the page correctly' 但看起来没问题。 'Critical parser failure: XObject resource missing'.

进站呕吐

PdfReader.KillIndirect(obj);
oPdfFile.GetPdfReader().RemoveUnusedObjects();
var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(@"C:\temp.pdf", FileMode.Create));
stamper.Close();

选项 2 - CleanupProcessor - 抛出关于 'A Graphics object cannot be created from an image that has an indexed pixel format' 的异常。

var stamper = new PdfStamper(oPdfFile.GetPdfReader(), new FileStream(@"C:\temp.pdf", FileMode.Create));
var cleanupLocations = new List<PdfCleanUpLocation>();
var pageRect = oPdfFile.GetPdfReader().GetCropBox(1);
cleanupLocations.Add(new PdfCleanUpLocation(1, pageRect));
PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanupLocations, stamper);
cleaner.CleanUp();
stamper.Close();

我想删除 /QuickPDF 对象（此图中的 41 0 R），并将其从使用 /QuickPDF Do 调用它的内容流中删除。

很遗憾，我无法提供 PDF。

关于如何执行此操作的任何提示？

Answer 1

我不想回答我自己的问题，但我想分享我找到的解决方案，以防其他人需要。

玩了几天之后，我发现上面的选项 1 确实会删除对象，而我从 PitStop 得到的异常是因为内容流引用了 /QuickPDF XObject。

所以我尝试在此处遵循@mkl 的解决方案Removing Watermark from PDF iTextSharp，但它一直将不需要的数据放入旋转我的 PDF 的内容流中。

然后我在这里 Removing Watermark from a PDF using iTextSharp 找到了@Chris 的解决方案，虽然我不确定这个解决方案的稳定性如何，但它似乎有效。

这是我从内容流中删除 /QuickPDF 的解决方案：

int numPages = oPdfFile.GetPdfReader().NumberOfPages;
int pgNumber = 1;

PdfDictionary page = oPdfFile.GetPdfReader().GetPageN(pgNumber);
PdfArray contentarray = page.GetAsArray(PdfName.CONTENTS);
PRStream stream;
string content;
if (contentarray != null)
{
    //Loop through content
    for (int j = 0; j < contentarray.Size; j++)
    {
        stream = (PRStream)contentarray.GetAsStream(j);
        content = Encoding.ASCII.GetString(PdfReader.GetStreamBytes(stream));
        string[] tokens = content.Split('\n');
        for (int i = 0; i< tokens.Length; i++)
        {
            if (tokens[i].Contains("/QuickPDF"))
            {
                tokens[i] = string.Empty;
            }
        }

        string outstr = string.Join("\n", tokens.Select(p => p).ToArray());
        byte[] outbytes = Encoding.ASCII.GetBytes(outstr);
        stream.SetData(outbytes);
    }
}

使用 iTextSharp 删除 PDF 中的对象并保存

Remove object in PDF with iTextSharp and save

pdf

itext