从 Pandas 数据帧创建 docx 文件的嵌套循环

Question

我有一个包含调查评论的数据框。每个受访者都有一列，其中包含组号。然后有几列包含 header 行中的问题文本和后续行中的响应。不是每个人都回答了每个问题，所以有空白单元格。

我想使用docx包将评论输出到Word文件。我想将问题文本显示为标题，在标题下方显示组号（按组号对答复进行分组），在下方以项目符号列表显示问题评论，然后转到下一个问题并重复.另外，我不想输出空白单元格。

下面的代码说明了我正在尝试做什么。

import docx
import pandas as pd
from docx import Document
import numpy as np
from docx.shared import Inches
from docx.enum.section import WD_SECTION
from docx.enum.section import WD_ORIENT

# initialize list of lists 
data = [['Group 1', 'Comment A', 'Comment B', 'Comment C'], ['Group 2', 'Comment D', '', ''], ['Group 2', 'Comment E', '', 'Comment F'], ['Group 1', '', 'Comment G', 'Comment H'], ] 

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Group', 'Question 1', 'Question 2', 'Question 3']) 
print(df)

# create file
doc = Document()

sections = doc.sections
section = sections[0]

# Convert to landscape orientation
new_width, new_height = section.page_height, section.page_width
section.orientation = WD_ORIENT.LANDSCAPE
section.page_width = new_width
section.page_height = new_height

# Document Title
doc.add_heading('Document Title', level=0)

# Opening text
doc.add_paragraph('Some text...')

# Do I need to sort by 'Group' before doing the loops?

# loop through the questions - this isn't working
for column in df[2:]:
    # create a heading for each question
    doc.add_heading(column, level=1)
    for g in df.Group:
        # create a heading for each question
        doc.add_heading(g, level=3)
        for c in df[g]:
            doc.add_paragraph(c, style='List Bullet')

# save the doc
doc.save('./test.docx')

输出将是：

Document Title

Some text...

Question 1

Group 1
 - Comment A

Group 2
 - Comment D
 - Comment E

Question 2

Group 1
 - Comment B
 - Comment G

Question 3

Group 1
 - Comment C
 - Comment H

Group 2
 - Comment F

Answer 1

这适用于循环：

# loop through the questions
for column in df.columns[1:]:
    # create a heading for each question
    doc.add_heading(column, level=3)
    ###Make a new dataframe with only Group and column of interest
    new_df = df[['Group', column]]
    ###Make list of all units
    unit_list = list(new_df['Group'].unique())
    ###Make list of comments in each unit for this column
    for unit in unit_list:
        comments = [row[2] for row in new_df.itertuples() if row[1] == unit]
        comments = [i for i in comments if len(i) > 0]
        ###If there were any comments in this unit, add the unit as a subheader
        if len(comments) > 0:
            doc.add_heading(unit, level=4)
            # Bullet list of comments
            for c in comments:
                doc.add_paragraph(c, style='List Bullet')

从 Pandas 数据帧创建 docx 文件的嵌套循环

Nested loops to create docx file from Pandas dataframe

python

loops

dataframe

python-docx