PDF 读取内容流时出错
PDF Error reading a content stream
我正在努力捕获对 show
的 postscript 调用并存储当前字体和字体大小以在 pdf 文本对象中输出。
PDF file
Input Postscript Program
但是 identify
给我一个错误:
$ identify pd0.pdf
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
pd0.pdf[0] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[1] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[2] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
而且 ghostscript 的输出没有提供我需要了解问题的详细信息:
$ gsnd -dPDFDEBUG pd0.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
<<
/Root 1 0 R
/Size 12 >>
%Resolving: [1 0]
<<
/Type /Catalog /Pages 2 0 R
>>
endobj
%Resolving: [2 0]
<<
/Kids [
3 0 R
6 0 R
9 0 R
]
/Type /Pages /Count 3 >>
endobj
%Resolving: [3 0]
<<
/Parent 2 0 R
/Contents [
5 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F1 4 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [6 0]
<<
/Parent 2 0 R
/Contents [
8 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F2 7 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [9 0]
<<
/Parent 2 0 R
/Contents [
11 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F3 10 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 3.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [5 0]
<<
/Length 15660 >>
stream
%FilePosition: 471
endobj
BT
F1
10.0 Tf
%Resolving: [4 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 2
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [8 0]
<<
/Length 31667 >>
stream
%FilePosition: 16474
endobj
BT
F2
10.0 Tf
%Resolving: [7 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 3
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [11 0]
<<
/Length 8335 >>
stream
%FilePosition: 48487
endobj
BT
F3
10.0 Tf
%Resolving: [10 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
GS>
任何人都可以帮助我了解我正在输出的 pdf 文件有什么问题吗?
PDF 中存在一些错误。根据相关的 PDF 查看器,需要修复它们的较小或较大子集以允许按预期显示 PDF。
页面内容流
页面内容流的内容如下所示:
BT F1 10.0 Tf 30.0 750.0 Td (<< ) Tj ET BT F1 10.0 Tf 50.0 738.0 Td (/) Tj ET [...]
这里的错误是在字体选择说明中:
F1 10.0 Tf
字体名称操作数 F1 不是作为 PDF 名称对象(可通过前导斜线识别)给出的,而是作为通常为指令运算符保留的一些通用文字。
(顺便说一句,这些内容流结构不必要地臃肿,大多数单独的文本对象只绘制一到三个字形并且具有(总是相同的)它们自己的文本字体选择指令。本身不是错误但完全没有必要)
此外,正如@usr2564301 所指出的,流长度似乎偏移了 1。
字体资源
每个字体资源是这样的:
<<
/Type /Font
/SubType /Type1
/BaseFont /Palatino-Roman
>>
首先存在一个问题:正如@KenS 已经指出的,正确的拼写是 Subtype,而不是 SubType.
不还有另一个问题:所以只有 PDF 1.7 的短字体资源词典只允许用于标准 14 种字体,而 PDF 2.0 不允许在所有了。由于Palatino-Roman显然不是标准的14号字体,反正资源也不全
根据 Table 109 — ISO 32000-2 Type 1 字体字典中的条目,
- Type、Subtype 和 BaseFont 是普遍的 Required,
- FirstChar、LastChar、Widths 和 FontDescriptor 是 必需的,但在 PDF 1.0-1.7 中对于标准的 14 种字体是可选的,
- 名称在 PDF 1.0 中是必需的,在 PDF 1.1 到 1.7 中是可选的,在 PDF 2.0 中是弃用的,
- Encoding 和 ToUnicode 是通用的 Optional.
根据您尝试的 PDF 查看器,要求可能看起来更宽松,但如果您不符合规格要求,任何 PDF 处理器都可能有理由拒绝您的 PDF。
交叉引用
@usr2564301 还提到许多交叉引用 table 条目(以及对交叉引用 table 本身的开始的引用)偏移 1.
他们确实没有指向对象编号/ xref 文字而是指向白色 space 之前。由于在数字/文字之前只有白色 space 必须被忽略,因此许多 PDF 处理器不会注意到。
我正在努力捕获对 show
的 postscript 调用并存储当前字体和字体大小以在 pdf 文本对象中输出。
PDF file
Input Postscript Program
但是 identify
给我一个错误:
$ identify pd0.pdf
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
pd0.pdf[0] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[1] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
pd0.pdf[2] PBM 612x792 612x792+0+0 16-bit Bilevel Gray 61KB 0.000u 0:00.000
而且 ghostscript 的输出没有提供我需要了解问题的详细信息:
$ gsnd -dPDFDEBUG pd0.pdf
GPL Ghostscript 9.18 (2015-10-05)
Copyright (C) 2015 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
<<
/Root 1 0 R
/Size 12 >>
%Resolving: [1 0]
<<
/Type /Catalog /Pages 2 0 R
>>
endobj
%Resolving: [2 0]
<<
/Kids [
3 0 R
6 0 R
9 0 R
]
/Type /Pages /Count 3 >>
endobj
%Resolving: [3 0]
<<
/Parent 2 0 R
/Contents [
5 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F1 4 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [6 0]
<<
/Parent 2 0 R
/Contents [
8 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F2 7 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [9 0]
<<
/Parent 2 0 R
/Contents [
11 0 R
]
/MediaBox [
0.0 0.0 612.0 792.0 ]
/Resources <<
/Font <<
/F3 10 0 R
>>
/ProcSet [
/PDF /Text ]
>>
/Type /Page >>
endobj
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [1 0]
%Resolving: [2 0]
Processing pages 1 through 3.
Page 1
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [3 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [5 0]
<<
/Length 15660 >>
stream
%FilePosition: 471
endobj
BT
F1
10.0 Tf
%Resolving: [4 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 2
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [6 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [8 0]
<<
/Length 31667 >>
stream
%FilePosition: 16474
endobj
BT
F2
10.0 Tf
%Resolving: [7 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
Page 3
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [3 0]
%Resolving: [6 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [9 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [1 0]
%Resolving: [2 0]
%Resolving: [11 0]
<<
/Length 8335 >>
stream
%FilePosition: 48487
endobj
BT
F3
10.0 Tf
%Resolving: [10 0]
<<
/Type /Font /SubType /Type1 /BaseFont /Palatino-Roman >>
endobj
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** This file had errors that were repaired or ignored.
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.
GS>
任何人都可以帮助我了解我正在输出的 pdf 文件有什么问题吗?
PDF 中存在一些错误。根据相关的 PDF 查看器,需要修复它们的较小或较大子集以允许按预期显示 PDF。
页面内容流
页面内容流的内容如下所示:
BT F1 10.0 Tf 30.0 750.0 Td (<< ) Tj ET BT F1 10.0 Tf 50.0 738.0 Td (/) Tj ET [...]
这里的错误是在字体选择说明中:
F1 10.0 Tf
字体名称操作数 F1 不是作为 PDF 名称对象(可通过前导斜线识别)给出的,而是作为通常为指令运算符保留的一些通用文字。
(顺便说一句,这些内容流结构不必要地臃肿,大多数单独的文本对象只绘制一到三个字形并且具有(总是相同的)它们自己的文本字体选择指令。本身不是错误但完全没有必要)
此外,正如@usr2564301 所指出的,流长度似乎偏移了 1。
字体资源
每个字体资源是这样的:
<<
/Type /Font
/SubType /Type1
/BaseFont /Palatino-Roman
>>
首先存在一个问题:正如@KenS 已经指出的,正确的拼写是 Subtype,而不是 SubType.
不还有另一个问题:所以只有 PDF 1.7 的短字体资源词典只允许用于标准 14 种字体,而 PDF 2.0 不允许在所有了。由于Palatino-Roman显然不是标准的14号字体,反正资源也不全
根据 Table 109 — ISO 32000-2 Type 1 字体字典中的条目,
- Type、Subtype 和 BaseFont 是普遍的 Required,
- FirstChar、LastChar、Widths 和 FontDescriptor 是 必需的,但在 PDF 1.0-1.7 中对于标准的 14 种字体是可选的,
- 名称在 PDF 1.0 中是必需的,在 PDF 1.1 到 1.7 中是可选的,在 PDF 2.0 中是弃用的,
- Encoding 和 ToUnicode 是通用的 Optional.
根据您尝试的 PDF 查看器,要求可能看起来更宽松,但如果您不符合规格要求,任何 PDF 处理器都可能有理由拒绝您的 PDF。
交叉引用
@usr2564301 还提到许多交叉引用 table 条目(以及对交叉引用 table 本身的开始的引用)偏移 1.
他们确实没有指向对象编号/ xref 文字而是指向白色 space 之前。由于在数字/文字之前只有白色 space 必须被忽略,因此许多 PDF 处理器不会注意到。