当使用 PdfStamper 将图像放在 OverContent 上时，以后如何找到它？

Question

当以这种方式使用压模将条形码图像放置在 pdf 上时：

  PdfContentByte page = stamper.GetOverContent(i);
  image.SetAbsolutePosition(x, y);
  page.AddImage(image);

当 PDF 在查看器中呈现时它可以正确显示，但下面的代码无法找到它 (adapted from here)。代码根本不承认它是存在的。该代码找到了由 Acrobat Pro XI 放置在 Pdf 中的图像，但不是以上述方式添加的图像。

在 iTextSharp 中将条形码图像放置在 pdf 上以便将图像包含在 PdfDictionary 中的正确方法是什么？需要改什么，是上面的代码，还是下面的代码？

 for (int pageNumber = 1; pageNumber <= pdf.NumberOfPages; pageNumber++)
    {
         PdfDictionary pg = pdf.GetPageN(pageNumber);                  
         PdfObject obj = FindImageInPDFDictionary(pg);
         if (obj != null)
             {
                int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
                 PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
                 PdfStream pdfStrem = (PdfStream)pdfObj;
                 byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);
                   if ((bytes != null))
                        {
                            using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
                            {
                                memStream.Position = 0;
                                System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                     // now we have an image and can examine it
                     // to see if it is a barcode               
                            }

                    }
             }

        }

Answer 1

首先，iText Image 对象不一定是位图图像，也可以是包含例如只有矢量图。另一方面，提取代码只考虑位图图像。

不过，在手头的案例中，事实证明图像确实是位图图像。

iText 将图像添加到 OverContent 的方式没有什么特别之处，问题是 accepted answer 中的 FindImageInPDFDictionary 方法到您提到的问题：

private static PdfObject FindImageInPDFDictionary(PdfDictionary pg) {
    PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));

    PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
    if (xobj != null) {
        foreach (PdfName name in xobj.Keys) {
            PdfObject obj = xobj.Get(name);
            if (obj.IsIndirect()) {
                PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);

                PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));

                //image at the root of the pdf
                if (PdfName.IMAGE.Equals(type)) {
                    return obj;
                }// image inside a form
                else if (PdfName.FORM.Equals(type)) {
                    return FindImageInPDFDictionary(tg);
                } //image inside a group
                else if (PdfName.GROUP.Equals(type)) {
                    return FindImageInPDFDictionary(tg);
                }
            }
        }
    }
    return null;
}

它有不止一处不足：

它只考虑资源中的第一个 Image、Form 或 Group xobject pg 字典，因为它在任何这些情况下立即 returns 不关心在后两种情况下任何递归调用是否 returns 是非 null 结果。
抛开上面的问题，它只检查页面资源和包含的表单 xobjects 和组的资源，仅此而已。因此，
- 它不会检查它找到的图像资源是否确实在页面上 使用了 ，因此它可能 return 根本不存在的图像页面，
- 它会忽略内容流中包含的内联图像，并且
- 它会忽略图案或 Type 3 字体中包含的图像。
忽略找到的图片是否有遮罩。有时掩码包含结果图像的主要信息，而基础图像仅确定颜色；特别是墨水签名图像通常包含笔在蒙版中的路径，而整个基本图像都充满了墨水颜色。
每页不能return一张图片。

此外，如果它在那个答案中使用

PdfDictionary pg = pdf.GetPageN(pageNumber);

// recursively search pages, forms and groups for images.
PdfObject obj = FindImageInPDFDictionary(pg);

然后仅检查与页面对象直接关联的资源，但也可以从页面树中的祖先节点继承资源。

你应该改用iText解析框架，cf。例如the answer to "Extract Images from PDF coordinates using iText" 或其变体（经常引用 MyImageRenderListener class）。特别是

它 return 通过回调提供所有发现，而不仅仅是每页一个；
它不会忽略它要考虑的一些图像；
它会扫描内容流，因此会找到内联图像，并且只会找到那些实际使用的资源；
如果适用，它 return 是图像的遮罩；
作为奖励它return图像使用的位置和变换。

虽然它并不完美：特别是它不扫描图像的模式和类型 3 字体（但解析框架允许尝试提取类型 3 字体用作文本），并且它不查看继承的资源要么。

当使用 PdfStamper 将图像放在 OverContent 上时，以后如何找到它？

When an image is placed on OverContent with PdfStamper how can it be found later?

itext