使用 Python 制作包含 100 多个图的 PDF 报告的最佳方式是什么？

Question

我需要一份包含大量图表的 PDF 报告。它们中的大多数将在一个循环中使用 matplotlib 创建，但我还需要包括 pandas 图和数据帧（整个视图）和 seaborn 图。现在我探索了以下解决方案：

PythonTex。我已经将它用于其他项目，但它会消耗大量时间，因为你必须为要显示的每个图编写 \pythontexprint。
在循环的每次迭代中使用savefig命令，并将所有绘图保存为图像，以便稍后在 Latex 中全部插入。那也是非常耗时的选择。另一个选项是使用该命令将图保存为 pdf，然后合并所有 pdf。这将创建一个丑陋的报告，因为这些图不会适合整个页面。
使用带网状结构的 RStudio 创建 Markdown 报告。这里的问题是我需要学习网状功能，因此需要花费时间。
据我所知，PyPDF 不符合我的需求。
创建一个 jupyter 笔记本，然后尝试将其导出为 PDF。再一次，我不知道如何使用 jupyter 笔记本，我读到我必须先转换为 html，然后再转换为 pdf。
这里的解决方案：然而，这个问题是三年前的问题，现在可能是更好的选择。

所以我的问题如下：是否有任何简单快捷的方法可以将所有这些图（如果它沿着生成它们的代码更好）以一个体面的 PDF 格式？

Answer 1

我的建议是将 matplotlibs savefig 用于 BytesIO 缓冲区（或将缓冲区保存到列表或类似的数据结构 100）。然后，您可以使用这些图像缓冲区使用像 reportlab（网站 here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here 这样的库将图像插入 pdf。reportlab 是 BSD 许可证，可在 pip 和 conda 上使用。

反正我的代码片段是这样的。
抱歉，它有点长，但我的代码有一些辅助函数来打印文本和虚拟图像。你应该可以直接copy/paste它。

该代码将生成如下所示的 PDF

import io

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch

import numpy as np
import matplotlib.pyplot as plt


def plot_hist():
    """ Create a sample histogram plot and return a bytesio buffer with plot

    Returns
    -------
    BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
    """    
    # from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
    plt.figure(figsize=(7, 2.25))

    N = 100
    r0 = 0.6
    x = 0.9 * np.random.rand(N)
    y = 0.9 * np.random.rand(N)
    area = (20 * np.random.rand(N))**2  # 0 to 10 point radii
    c = np.sqrt(area)
    r = np.sqrt(x * x + y * y)
    area1 = np.ma.masked_where(r < r0, area)
    area2 = np.ma.masked_where(r >= r0, area)
    plt.scatter(x, y, s=area1, marker='^', c=c)
    plt.scatter(x, y, s=area2, marker='o', c=c)
    # Show the boundary between the regions:
    theta = np.arange(0, np.pi / 2, 0.01)
    plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))

    # create buffer and save image to buffer
    # dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
    buf = io.BytesIO()
    plt.savefig(buf, format='png', dpi=300)
    buf.seek(0)
    # you'll want to close the figure once its saved to buffer
    plt.close()

    return buf

def add_text(text, style="Normal", fontsize=12):
    """ Adds text with some spacing around it to  PDF report 

    Parameters
    ----------
    text : str
        The string to print to PDF

    style : str
        The reportlab style

    fontsize : int
        The fontsize for the text
    """
    Story.append(Spacer(1, 12))
    ptext = "<font size={}>{}</font>".format(fontsize, text)
    Story.append(Paragraph(ptext, styles[style]))
    Story.append(Spacer(1, 12))

# Use basic styles and the SimpleDocTemplate to get started with reportlab
styles=getSampleStyleSheet()
doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
                        rightMargin=inch/2,leftMargin=inch/2,
                        topMargin=72,bottomMargin=18)

# The "story" just holds "instructions" on how to build the PDF
Story=[]

add_text("My Report", style="Heading1", fontsize=24)

# See plot_hist for information on how to get BytesIO object of matplotlib plot
# This code uses reportlab Image function to add and valid PIL input to the report
image_buffer1 = plot_hist()
im = Image(image_buffer1, 7*inch, 2.25*inch)
Story.append(im)

add_text("This text explains something about the chart.")

image_buffer2 = plot_hist()
im = Image(image_buffer2, 7*inch, 2.25*inch)
Story.append(im)

add_text("This text explains something else about another chart chart.")

# This command will actually build the PDF
doc.build(Story)

# should close open buffers, can use a "with" statement in python to do this for you
# if that works better
image_buffer1.close()
image_buffer2.close()

使用 Python 制作包含 100 多个图的 PDF 报告的最佳方式是什么？

Which is the best way to make a report in PDF with more than 100 plots with Python?

python

pdf

report

matplotlib

pandas