有没有一种有效的方法可以将几个 HTML 字符串写入 Java 中的 PDF 文档？

Question

我正在查询 API 页面以创建该页面上信息的 PDF。 API 个页面首先被解析为 "TestCase" 个对象。 TestCase 对象中的许多字段值都是 HTML 字符串。有没有一种省时的方法可以将这些 HTML 字符串写入新的 PDF 文档？

我目前正在使用 iTextPDF 及其 XMLWorkerHelper 来解析 HTML 字符串并将它们写入 PDF 文档。我运行遇到的问题是，因为有太多的字段我必须写成单独的 HTML 字符串，所以对每个 PDF 执行此步骤大约需要 5-6 秒文档，但程序的其余部分只需要大约 3 或 4 。更糟糕的是，当我将其 Maven 项目导出为 jar 时，makePDF 步骤需要 20 秒才能处理每个 TestCase 对象。这一步比其他任何步骤都慢得多（包括查询 API 中的值并将其读取到 TestCase 对象中）。我已经尝试收集所有 HTML 字符串并将它们放在一个大字符串中以供读取，以防问题是我正在创建 XMLWorkerHelper 的多个实例来编写由 HTML 生成的 InputStream字符串但是这并没有加快那一步。

public void makePDF(TestCase tc) throws IOException, DocumentException {
        OutputStream file = new FileOutputStream(filename);
        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, file);
        document.open();
        List<FieldValue> values = tc.getFieldValues();
        for (int i = 0; i < values.size(); ++i) {
            FieldValue fv = values.get(i);
            InputStream is = new ByteArrayInputStream(fv.getValue());
            XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
            document.add(new Paragraph("\n"));
        }
        document.close();
}

我仍然需要 HTML 字符串才能正确打印到 PDF 文档，但我希望尽可能缩短打印时间。在许多情况下，我将 20 或 30 个 TestCase 对象输入到此函数中（在某些情况下，一次多达 500 个）因此使此过程花费更少的时间到运行非常重要，因为人们使用该工具不想为了制作几个 PDF 而等待 6 或 7 分钟。非常感谢任何建议。

Answer 1

在你说的评论中

Also, after running it with a test case that has none of the fields set (there are about 35 fields), where the only thing I add to the document is the field name, it still takes 20 seconds to write it all to the document.

为了测试这一点，我使用了以下代码（本质上是您的代码，您的字段值及时生成并且字段计数为常量）：

int fieldCount = 35;
long start = System.nanoTime();

OutputStream file = new FileOutputStream(filename);
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
for (int i = 0; i < fieldCount; ++i) {
    InputStream is = new ByteArrayInputStream(("<p>" + "Value " + i + "</p>").getBytes());
    XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
    document.add(new Paragraph("\n"));
}
document.close();

long end = System.nanoTime();
System.out.printf("Created %s with %d fields in %f seconds.\n", filename.getName(), fieldCount, ((float)end - (float)start) / 1000000000f);

(TimingXmlWorker 测试 testMakePdfLikeEvanV)

输出：

Created MakePdfLikeEvanV.pdf with 35 fields in 3.221226 seconds.

你还说

I've tried collecting all of the HTML strings and putting them in one big string to read from in case the issue was that I was creating several instances of the XMLWorkerHelper to write the InputStream made from the HTML strings however this did not speed up that step.

我测试如下：

int fieldCount = 10000;
long start = System.nanoTime();

OutputStream file = new FileOutputStream(filename);
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
StringBuilder xmlString = new StringBuilder();
for (int i = 0; i < fieldCount; ++i) {
    xmlString.append("<p>")
             .append(("Value " + i))
             .append("</p>");
}
InputStream is = new ByteArrayInputStream(xmlString.toString().getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();

long end = System.nanoTime();
System.out.printf("Created %s with %d fields in %f seconds.\n", filename.getName(), fieldCount, ((float)end - (float)start) / 1000000000f);

(TimingXmlWorker 测试 testMakePdfLikeEvanVSingleWorkerCall)

输出：

Created MakePdfLikeEvanVSingleWorkerCall.pdf with 10000 fields in 1.610613 seconds.

因此，我无法重现该问题。

如果您碰巧在一台慢速计算机上运行代码，那么第一种情况的差异（每个领域的独立工作者）可能是可以解释的。但在第二种情况下（所有领域的单一工人）你的观察 "no speed up" 完全偏离我的观察，在这种情况下我不得不大幅增加 "fields" 的数量以获得运行一秒以上的时间。

因此，有一个你没有提到的因素是休息。

您是否可能正在存储到网络文件系统上，并且需要额外的时间来进行权限检查和传输？
或者您通过 tc.getFieldValues() 检索的 List<FieldValue> values 实际上是为每个 values.size() 和 values.get(i) 执行 Web 服务请求？
或者 fv.getValue() 正在执行这样的 Web 服务请求？
或者...

有没有一种有效的方法可以将几个 HTML 字符串写入 Java 中的 PDF 文档？

Is there an efficient way to write several HTML strings to a PDF document in Java?

html

java

performance

itext