使用 Python 制作包含 100 多个图的 PDF 报告的最佳方式是什么?

Which is the best way to make a report in PDF with more than 100 plots with Python?

我需要一份包含大量图表的 PDF 报告。它们中的大多数将在一个循环中使用 matplotlib 创建,但我还需要包括 pandas 图和数据帧(整个视图)和 seaborn 图。现在我探索了以下解决方案:

所以我的问题如下:是否有任何简单快捷的方法可以将所有这些图(如果它沿着生成它们的代码更好)以一个体面的 PDF 格式?

我的建议是将 matplotlibs savefig 用于 BytesIO 缓冲区(或将缓​​冲区保存到列表或类似的数据结构 100)。然后,您可以使用这些图像缓冲区使用像 reportlab(网站 here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here 这样的库将图像插入 pdf。reportlab 是 BSD 许可证,可在 pip 和 conda 上使用。

反正我的代码片段是这样的。
抱歉,它有点长,但我的代码有一些辅助函数来打印文本和虚拟图像。你应该可以直接copy/paste它。

该代码将生成如下所示的 PDF

import io

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch

import numpy as np
import matplotlib.pyplot as plt


def plot_hist():
    """ Create a sample histogram plot and return a bytesio buffer with plot

    Returns
    -------
    BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
    """    
    # from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
    plt.figure(figsize=(7, 2.25))

    N = 100
    r0 = 0.6
    x = 0.9 * np.random.rand(N)
    y = 0.9 * np.random.rand(N)
    area = (20 * np.random.rand(N))**2  # 0 to 10 point radii
    c = np.sqrt(area)
    r = np.sqrt(x * x + y * y)
    area1 = np.ma.masked_where(r < r0, area)
    area2 = np.ma.masked_where(r >= r0, area)
    plt.scatter(x, y, s=area1, marker='^', c=c)
    plt.scatter(x, y, s=area2, marker='o', c=c)
    # Show the boundary between the regions:
    theta = np.arange(0, np.pi / 2, 0.01)
    plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))

    # create buffer and save image to buffer
    # dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
    buf = io.BytesIO()
    plt.savefig(buf, format='png', dpi=300)
    buf.seek(0)
    # you'll want to close the figure once its saved to buffer
    plt.close()

    return buf

def add_text(text, style="Normal", fontsize=12):
    """ Adds text with some spacing around it to  PDF report 

    Parameters
    ----------
    text : str
        The string to print to PDF

    style : str
        The reportlab style

    fontsize : int
        The fontsize for the text
    """
    Story.append(Spacer(1, 12))
    ptext = "<font size={}>{}</font>".format(fontsize, text)
    Story.append(Paragraph(ptext, styles[style]))
    Story.append(Spacer(1, 12))

# Use basic styles and the SimpleDocTemplate to get started with reportlab
styles=getSampleStyleSheet()
doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
                        rightMargin=inch/2,leftMargin=inch/2,
                        topMargin=72,bottomMargin=18)

# The "story" just holds "instructions" on how to build the PDF
Story=[]

add_text("My Report", style="Heading1", fontsize=24)

# See plot_hist for information on how to get BytesIO object of matplotlib plot
# This code uses reportlab Image function to add and valid PIL input to the report
image_buffer1 = plot_hist()
im = Image(image_buffer1, 7*inch, 2.25*inch)
Story.append(im)

add_text("This text explains something about the chart.")

image_buffer2 = plot_hist()
im = Image(image_buffer2, 7*inch, 2.25*inch)
Story.append(im)

add_text("This text explains something else about another chart chart.")

# This command will actually build the PDF
doc.build(Story)

# should close open buffers, can use a "with" statement in python to do this for you
# if that works better
image_buffer1.close()
image_buffer2.close()