PDF 与 itext 和 pdfbox 合并

Question

我有一个多模块 maven 项目，因为有一个请求生成过程，在这个过程中有一些 vaadin 的上传组件，我们正在上传一些只能是 png、jpgs、pdf 和 bmp 的文件. 在这个过程的最后，我将所有文档类型合并为一个 pdf，然后使用文件下载器下载它。

我在按钮点击事件上调用的函数是：

   /**
     * This function is responsible for getting 
     * all documents from request and merge 
     * them in a single pdf file for 
     * download purposes
     * @throws Exception 
     */
    protected void downloadMergedDocument() throws Exception {

    // Calling create pdf function for merged pdf
    createPDF();

    // Setting the merged file as a resource for file downloader
    Resource myResource = new FileResource(new File (mergedReportPath +request.getWebProtocol()+ ".pdf"));
    FileDownloader fileDownloader = new FileDownloader(myResource);

    // Extending the download button for download   
    fileDownloader.extend(downloadButton);

}

/**
 * This function is responsible for providing 
 * the PDF related to a particular request that 
 * contains all the documents merged inside it 
 * @throws Exception
 */
private void createPDF() throws Exception {
    try{
        // Getting the current request
        request = evaluationRequestUI.getRequest();

        // Fetching all documents of the request            
        Collection<DocumentBean> docCollection = request.getDocuments();

        // Initializing Document of using itext library
        Document doc = new Document();

        // Setting PdfWriter for getting the merged images file
        PdfWriter.getInstance(doc, new FileOutputStream(mergedReportPath+ "/mergedImages_" + request.getWebProtocol()+ ".pdf"));

        // Opening document
        l_doc.open();

        /**
         * Here iterating on document collection for the images type   
         * document for merging them into one pdf    
         */                                        
        for (DocumentBean documentBean : docCollection) {
            byte[] documents = documentBean.getByteArray();

            if(documentBean.getFilename().toLowerCase().contains("png") ||
                    documentBean.getFilename().toLowerCase().contains("jpeg") ||
                    documentBean.getFilename().toLowerCase().contains("jpg") ||
                    documentBean.getFilename().toLowerCase().contains("bmp")){

                Image img = Image.getInstance(documents);

                doc.setPageSize(img);
                doc.newPage();
                img.setAbsolutePosition(0, 0);
                doc.add(img);
            }
        }

        // Closing the document
        doc.close();

        /**
         * Here we get all the images type documents merged into 
         * one pdf, now moving to pdfbox for searching the pdf related 
         * document types in the request and merging the above resultant      
         * pdf and the pdf document in the request into one pdf
         */

        PDFMergerUtility utility = new PDFMergerUtility();

        // Adding the above resultant pdf as a source 
        utility.addSource(new File(mergedReportPath+ "/mergedImages_" + request.getWebProtocol()+ ".pdf"));

        // Iterating for the pdf document types in the collection
        for (DocumentBean documentBean : docCollection) {
            byte[] documents = documentBean.getByteArray();

            if(documentBean.getFilename().toLowerCase().contains("pdf")){
                utility.addSource(new ByteArrayInputStream(documents));
            }
        }

        // Here setting the final pdf name
        utility.setDestinationFileName(mergedReportPath +request.getWebProtocol()+ ".pdf");

        // Here final merging and then result
        utility.mergeDocuments();

    }catch(Exception e){
        m_logger.error("CATCH", e);
        throw e;
    }
}

注意：mergedReportPath是为pdf文件定义的存放路径，然后
从那里检索以供下载。

现在，我有两个问题：

当我为第一个请求执行此过程时，它会在目标文件夹，但它不下载它。
当我再次为第二个请求执行此过程时，它卡在了 utility.mergedocuments()，我的意思是如果它发现 pdf 已经存在于它卡住的目标文件夹中。我不知道在哪里问题是。请帮忙

Answer 1

在您问题的评论部分，您已阐明您不需要磁盘上的文件，但您想将 PDF 发送到浏览器。您想知道如何实现这一目标。官方文档对此有解释：How can I serve a PDF to a browser without storing a file on the server side?

这是在内存中创建 PDF 的方式：

// step 1
Document document = new Document();
// step 2
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter.getInstance(document, baos);
// step 3
document.open();
// step 4
document.add(new Paragraph("Hello"));
// step 5
document.close();

合并 PDF 使用 PdfCopy 完成：How to merge documents correctly? 您需要对这些示例应用与上述相同的原则：将 FileOutputStream 替换为 ByteArrayOutputStream.

现在您有存储在 baos 对象中的 PDF 字节。我们可以这样发送到浏览器：

// setting some response headers
response.setHeader("Expires", "0");
response.setHeader("Cache-Control",
    "must-revalidate, post-check=0, pre-check=0");
response.setHeader("Pragma", "public");
// setting the content type
response.setContentType("application/pdf");
// the contentlength
response.setContentLength(baos.size());
// write ByteArrayOutputStream to the ServletOutputStream
OutputStream os = response.getOutputStream();
baos.writeTo(os);
os.flush();
os.close();

如果您还有其他问题，请务必阅读 documentation。

Answer 2

在PDFBox 2.0版本中，您可以使用setDestinationStream()设置输出流。因此，您只需调用

response.setContentType("application/pdf");
OutputStream os = response.getOutputStream();
utility.setDestinationStream(os);
utility.mergeDocuments();
os.flush();
os.close();

您不能以这种方式设置响应大小；如果必须，请使用 ByteArrayOutputStream 就像 Bruno 的回答或 this one.

PDF 与 itext 和 pdfbox 合并

PDF merging with itext and pdfbox

java

hibernate

itext

pdfbox

vaadin7