java.lang.OutOfMemoryError 使用 pdfbox preflight 2.0.13 验证 pdf 时
java.lang.OutOfMemoryError when validating pdf with pdfbox preflight 2.0.13
不确定是否有人遇到过此问题,但在验证 pdf 时出现内存不足异常。在这里发帖以提高知名度,如果有人能提供帮助那就太棒了。
如果有人有任何想法,请分享。到这个时候我真的无法前进了。
Stuff I've tried
未成功遵循 wiki 中的建议PDFBox faq
最大堆大小从 2GB 增加到 4GB
删除了 jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
尝试使用 jdk 1.7
- 使用了临时文件(来自 wiki)
- 禁用了 PDImageXObject 的缓存(来自 wiki)
My Environment
- Linux 64 位(架构 linux)
- Java 8
- PDFBox/Preflight 版本。 2.0.13
- jbig imageio 版本。 3.0.2
Java info
java-版本
java版本“1.8.0_131”
Java(TM) SE 运行时环境(build 1.8.0_131-b11)
Java HotSpot(TM) 64 位服务器 VM(内部版本 25.131-b11,混合模式)
JVM Args used
java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Example pdf
Console Output
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
Sample code
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;
public class Validator {
private File file = null;
private List<ValidationError> errorList = new ArrayList<ValidationError>();
public Validator(File file) {
this.file = file;
}
public List<ValidationError> getErrors(){
return errorList;
}
public boolean validate() throws Exception{
PreflightParser parser = null;
PreflightDocument document = null;
ValidationResult result = null;
try {
parser = new PreflightParser(file);
parser.parse();
document = parser.getPreflightDocument();
document.validate();
result = document.getResult();
errorList = result.getErrorsList();
}
catch(Exception e) {
throw e;
}
finally {
if(document != null) {
try {
document.close();
}catch(Exception ignored) {}
}
parser = null;
document = null;
result = null;
}
return errorList.size() > 0 ? true : false;
}
}
当我添加这些选项时:
-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g
又失败了。我使用 VisualVM 来分析转储堆文件。我发现了一些有趣的东西。
而char[]的大部分内容是:
我在
中找到代码
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
{
COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
if (groupDictionary != null)
{
String sVal = groupDictionary.getNameAsString(COSName.S);
if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
{
context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
"Group has a transparency S entry or the S entry is null"));
}
}
}
它创建了一个 ValidationError 对象,但是构造函数是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
}
this.cause = cause;
t = new Exception();
}
你可以看到,一旦出错,它会创建ValidationError并创建一个StringBuilder。
所以,你有三种方法可以解决这个问题:
- 您可以扩展堆大小。 4G不够,试试16G以上
- 不要使用 PDFBox 库。
- 更改 PDFBox 源代码。
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
String key = errorCode + details;
if (commonDetailMap.containsKey(key)) {
this.details = commonDetailMap.get(key);
} else {
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
commonDetailMap.put(key, this.details);
}
}
this.cause = cause;
t = new Exception();
}
我认为使用 Map 来避免创建太多可能 StringBuilder 会起作用。但如果错误代码和详细信息是多值的,则地图会太大。
所以,另一种更改源代码的方法是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
// invoke intern
this.details = sb.toString().intern();
}
this.cause = cause;
t = new Exception();
}
实习生()是:
Returns a canonical representation for the string object.
我认为使用 intern() 更好。
不确定是否有人遇到过此问题,但在验证 pdf 时出现内存不足异常。在这里发帖以提高知名度,如果有人能提供帮助那就太棒了。
如果有人有任何想法,请分享。到这个时候我真的无法前进了。
Stuff I've tried
未成功遵循 wiki 中的建议PDFBox faq
最大堆大小从 2GB 增加到 4GB
删除了 jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
尝试使用 jdk 1.7
- 使用了临时文件(来自 wiki)
- 禁用了 PDImageXObject 的缓存(来自 wiki)
My Environment
- Linux 64 位(架构 linux)
- Java 8
- PDFBox/Preflight 版本。 2.0.13
- jbig imageio 版本。 3.0.2
Java info
java-版本
java版本“1.8.0_131”
Java(TM) SE 运行时环境(build 1.8.0_131-b11)
Java HotSpot(TM) 64 位服务器 VM(内部版本 25.131-b11,混合模式)
JVM Args used
java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Example pdf
Console Output
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
Sample code
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;
public class Validator {
private File file = null;
private List<ValidationError> errorList = new ArrayList<ValidationError>();
public Validator(File file) {
this.file = file;
}
public List<ValidationError> getErrors(){
return errorList;
}
public boolean validate() throws Exception{
PreflightParser parser = null;
PreflightDocument document = null;
ValidationResult result = null;
try {
parser = new PreflightParser(file);
parser.parse();
document = parser.getPreflightDocument();
document.validate();
result = document.getResult();
errorList = result.getErrorsList();
}
catch(Exception e) {
throw e;
}
finally {
if(document != null) {
try {
document.close();
}catch(Exception ignored) {}
}
parser = null;
document = null;
result = null;
}
return errorList.size() > 0 ? true : false;
}
}
当我添加这些选项时:
-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g
又失败了。我使用 VisualVM 来分析转储堆文件。我发现了一些有趣的东西。
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
{
COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
if (groupDictionary != null)
{
String sVal = groupDictionary.getNameAsString(COSName.S);
if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
{
context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
"Group has a transparency S entry or the S entry is null"));
}
}
}
它创建了一个 ValidationError 对象,但是构造函数是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
}
this.cause = cause;
t = new Exception();
}
你可以看到,一旦出错,它会创建ValidationError并创建一个StringBuilder。
所以,你有三种方法可以解决这个问题:
- 您可以扩展堆大小。 4G不够,试试16G以上
- 不要使用 PDFBox 库。
- 更改 PDFBox 源代码。
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
String key = errorCode + details;
if (commonDetailMap.containsKey(key)) {
this.details = commonDetailMap.get(key);
} else {
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
commonDetailMap.put(key, this.details);
}
}
this.cause = cause;
t = new Exception();
}
我认为使用 Map 来避免创建太多可能 StringBuilder 会起作用。但如果错误代码和详细信息是多值的,则地图会太大。
所以,另一种更改源代码的方法是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
// invoke intern
this.details = sb.toString().intern();
}
this.cause = cause;
t = new Exception();
}
实习生()是:
Returns a canonical representation for the string object.
我认为使用 intern() 更好。