Java iText HTML 到 PDF <pre> 块格式
Java iText HTML to PDF <pre> block formatting
我正在使用 iText 将 HTML 文件结构转换为 PDF。我的 HTML 文件包含 <pre>
块中的代码片段,但 iText 不会按原样保留它们的格式。
我的 <pre>
区块示例:
<something>
<somethingelse>
some content
</somethingelse>
</something>
这是 iText 输出到 PDF 的内容:
<something> <somethingelse> some content </somethingelse> </something>
有没有办法配置 iText 使其正确格式化?
我的 iText 代码片段:
FileOutputStream os = new FileOutputStream(...);
Document doc = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.getInstance(doc, os);
CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
HtmlPipelineContext htmlContext = new HtmlPipelineContext();
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.setImageProvider(new AbstractImageProvider() {
public String getImageRootPath() {
...
}
});
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(doc, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser parser = new XMLParser(worker);
doc.open();
for (String inputFile : inputFiles) {
parser.parse(new FileInputStream(inputFile), StandardCharsets.UTF_8);
}
doc.close();
您可以实现自己的 TagProcessor 并在 TagProcessorFactory 上注册它:
[...]
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
TagProcessorFactory tagFactory = Tags.getHtmlTagProcessorFactory();
tagFactory.addProcessor(new TagProcessor() {
@Override
public List<Element> startElement(WorkerContext ctx, Tag tag) {
return null;
}
@Override
public List<Element> content(WorkerContext ctx, Tag tag, String content) {
return null;
}
@Override
public List<Element> endElement(WorkerContext ctx, Tag tag, List<Element> currentContent) {
return null;
}
@Override
public boolean isStackOwner() {
return false;
}
}, "pre");
htmlContext.setTagFactory(tagFactory);
[...]
然后您可以使用 Tag 对象创建 iText 元素,并 return 在 List 对象中创建它们。如何格式化和处理内容完全取决于您。
以下代码段(基于您的代码段和 XMLWorker Documentation)创建一个包含 <pre>
块的 PDF。
public class HtmlToPdf {
// proper exception handling needs to be implemented
public static void main(String[] args) throws Exception {
Document document = new Document(PageSize.A4);
PdfWriter pdfWriter = PdfWriter.getInstance(document,
new FileOutputStream("r:/temp/testpdf.pdf")
);
CSSResolver cssResolver = XMLWorkerHelper.getInstance()
.getDefaultCssResolver(true);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(document, pdfWriter)
)
);
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser parser = new XMLParser(worker);
document.open();
String str = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \n"
+ " \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n"
+ "<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" xml:lang=\"en\">\n"
+ " <head>\n"
+ " <title>sample html</title>\n"
+ " </head>\n"
+ " <body>\n"
+ " <h2>sample text</h2>\n"
+ " <pre>\n"
+ " <something>\n"
+ " <somethingelse>\n"
+ " some content\n"
+ " </somethingelse>\n"
+ " </something>\n"
+ " </pre>\n"
+ " </body>\n"
+ "</html>";
parser.parse(new StringReader(str));
document.close();
}
}
我的代码是正确的。不正确的是 HTML 我试图从中转换。看起来像
<pre>
<code>
...
</code>
</pre>
而不是
<pre>
...
</pre>
iText 不喜欢那个嵌套的 <code>
块。
可以通过创建输入 html 文件的临时副本并调用
轻松转换它
String text = FileUtils.readFileToString(file);
text = text.replaceAll("\<code(.*?)\>", "");
text = text.replaceAll("\</code\>", "");
FileUtils.writeStringToFile(file, text);
FileUtils
是 org.apache.commons.io
的一部分。
我正在使用 iText 将 HTML 文件结构转换为 PDF。我的 HTML 文件包含 <pre>
块中的代码片段,但 iText 不会按原样保留它们的格式。
我的 <pre>
区块示例:
<something>
<somethingelse>
some content
</somethingelse>
</something>
这是 iText 输出到 PDF 的内容:
<something> <somethingelse> some content </somethingelse> </something>
有没有办法配置 iText 使其正确格式化?
我的 iText 代码片段:
FileOutputStream os = new FileOutputStream(...);
Document doc = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.getInstance(doc, os);
CSSResolver cssResolver = XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
HtmlPipelineContext htmlContext = new HtmlPipelineContext();
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.setImageProvider(new AbstractImageProvider() {
public String getImageRootPath() {
...
}
});
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(doc, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser parser = new XMLParser(worker);
doc.open();
for (String inputFile : inputFiles) {
parser.parse(new FileInputStream(inputFile), StandardCharsets.UTF_8);
}
doc.close();
您可以实现自己的 TagProcessor 并在 TagProcessorFactory 上注册它:
[...]
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
TagProcessorFactory tagFactory = Tags.getHtmlTagProcessorFactory();
tagFactory.addProcessor(new TagProcessor() {
@Override
public List<Element> startElement(WorkerContext ctx, Tag tag) {
return null;
}
@Override
public List<Element> content(WorkerContext ctx, Tag tag, String content) {
return null;
}
@Override
public List<Element> endElement(WorkerContext ctx, Tag tag, List<Element> currentContent) {
return null;
}
@Override
public boolean isStackOwner() {
return false;
}
}, "pre");
htmlContext.setTagFactory(tagFactory);
[...]
然后您可以使用 Tag 对象创建 iText 元素,并 return 在 List 对象中创建它们。如何格式化和处理内容完全取决于您。
以下代码段(基于您的代码段和 XMLWorker Documentation)创建一个包含 <pre>
块的 PDF。
public class HtmlToPdf {
// proper exception handling needs to be implemented
public static void main(String[] args) throws Exception {
Document document = new Document(PageSize.A4);
PdfWriter pdfWriter = PdfWriter.getInstance(document,
new FileOutputStream("r:/temp/testpdf.pdf")
);
CSSResolver cssResolver = XMLWorkerHelper.getInstance()
.getDefaultCssResolver(true);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext,
new PdfWriterPipeline(document, pdfWriter)
)
);
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser parser = new XMLParser(worker);
document.open();
String str = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \n"
+ " \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">\n"
+ "<html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\" xml:lang=\"en\">\n"
+ " <head>\n"
+ " <title>sample html</title>\n"
+ " </head>\n"
+ " <body>\n"
+ " <h2>sample text</h2>\n"
+ " <pre>\n"
+ " <something>\n"
+ " <somethingelse>\n"
+ " some content\n"
+ " </somethingelse>\n"
+ " </something>\n"
+ " </pre>\n"
+ " </body>\n"
+ "</html>";
parser.parse(new StringReader(str));
document.close();
}
}
我的代码是正确的。不正确的是 HTML 我试图从中转换。看起来像
<pre>
<code>
...
</code>
</pre>
而不是
<pre>
...
</pre>
iText 不喜欢那个嵌套的 <code>
块。
可以通过创建输入 html 文件的临时副本并调用
轻松转换它String text = FileUtils.readFileToString(file);
text = text.replaceAll("\<code(.*?)\>", "");
text = text.replaceAll("\</code\>", "");
FileUtils.writeStringToFile(file, text);
FileUtils
是 org.apache.commons.io
的一部分。