从 Pandas 数据帧创建 docx 文件的嵌套循环
Nested loops to create docx file from Pandas dataframe
我有一个包含调查评论的数据框。每个受访者都有一列,其中包含组号。然后有几列包含 header 行中的问题文本和后续行中的响应。不是每个人都回答了每个问题,所以有空白单元格。
我想使用docx包将评论输出到Word文件。我想将问题文本显示为标题,在标题下方显示组号(按组号对答复进行分组),在下方以项目符号列表显示问题评论,然后转到下一个问题并重复.另外,我不想输出空白单元格。
下面的代码说明了我正在尝试做什么。
import docx
import pandas as pd
from docx import Document
import numpy as np
from docx.shared import Inches
from docx.enum.section import WD_SECTION
from docx.enum.section import WD_ORIENT
# initialize list of lists
data = [['Group 1', 'Comment A', 'Comment B', 'Comment C'], ['Group 2', 'Comment D', '', ''], ['Group 2', 'Comment E', '', 'Comment F'], ['Group 1', '', 'Comment G', 'Comment H'], ]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Group', 'Question 1', 'Question 2', 'Question 3'])
print(df)
# create file
doc = Document()
sections = doc.sections
section = sections[0]
# Convert to landscape orientation
new_width, new_height = section.page_height, section.page_width
section.orientation = WD_ORIENT.LANDSCAPE
section.page_width = new_width
section.page_height = new_height
# Document Title
doc.add_heading('Document Title', level=0)
# Opening text
doc.add_paragraph('Some text...')
# Do I need to sort by 'Group' before doing the loops?
# loop through the questions - this isn't working
for column in df[2:]:
# create a heading for each question
doc.add_heading(column, level=1)
for g in df.Group:
# create a heading for each question
doc.add_heading(g, level=3)
for c in df[g]:
doc.add_paragraph(c, style='List Bullet')
# save the doc
doc.save('./test.docx')
输出将是:
Document Title
Some text...
Question 1
Group 1
- Comment A
Group 2
- Comment D
- Comment E
Question 2
Group 1
- Comment B
- Comment G
Question 3
Group 1
- Comment C
- Comment H
Group 2
- Comment F
这适用于循环:
# loop through the questions
for column in df.columns[1:]:
# create a heading for each question
doc.add_heading(column, level=3)
###Make a new dataframe with only Group and column of interest
new_df = df[['Group', column]]
###Make list of all units
unit_list = list(new_df['Group'].unique())
###Make list of comments in each unit for this column
for unit in unit_list:
comments = [row[2] for row in new_df.itertuples() if row[1] == unit]
comments = [i for i in comments if len(i) > 0]
###If there were any comments in this unit, add the unit as a subheader
if len(comments) > 0:
doc.add_heading(unit, level=4)
# Bullet list of comments
for c in comments:
doc.add_paragraph(c, style='List Bullet')
我有一个包含调查评论的数据框。每个受访者都有一列,其中包含组号。然后有几列包含 header 行中的问题文本和后续行中的响应。不是每个人都回答了每个问题,所以有空白单元格。
我想使用docx包将评论输出到Word文件。我想将问题文本显示为标题,在标题下方显示组号(按组号对答复进行分组),在下方以项目符号列表显示问题评论,然后转到下一个问题并重复.另外,我不想输出空白单元格。
下面的代码说明了我正在尝试做什么。
import docx
import pandas as pd
from docx import Document
import numpy as np
from docx.shared import Inches
from docx.enum.section import WD_SECTION
from docx.enum.section import WD_ORIENT
# initialize list of lists
data = [['Group 1', 'Comment A', 'Comment B', 'Comment C'], ['Group 2', 'Comment D', '', ''], ['Group 2', 'Comment E', '', 'Comment F'], ['Group 1', '', 'Comment G', 'Comment H'], ]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Group', 'Question 1', 'Question 2', 'Question 3'])
print(df)
# create file
doc = Document()
sections = doc.sections
section = sections[0]
# Convert to landscape orientation
new_width, new_height = section.page_height, section.page_width
section.orientation = WD_ORIENT.LANDSCAPE
section.page_width = new_width
section.page_height = new_height
# Document Title
doc.add_heading('Document Title', level=0)
# Opening text
doc.add_paragraph('Some text...')
# Do I need to sort by 'Group' before doing the loops?
# loop through the questions - this isn't working
for column in df[2:]:
# create a heading for each question
doc.add_heading(column, level=1)
for g in df.Group:
# create a heading for each question
doc.add_heading(g, level=3)
for c in df[g]:
doc.add_paragraph(c, style='List Bullet')
# save the doc
doc.save('./test.docx')
输出将是:
Document Title
Some text...
Question 1
Group 1
- Comment A
Group 2
- Comment D
- Comment E
Question 2
Group 1
- Comment B
- Comment G
Question 3
Group 1
- Comment C
- Comment H
Group 2
- Comment F
这适用于循环:
# loop through the questions
for column in df.columns[1:]:
# create a heading for each question
doc.add_heading(column, level=3)
###Make a new dataframe with only Group and column of interest
new_df = df[['Group', column]]
###Make list of all units
unit_list = list(new_df['Group'].unique())
###Make list of comments in each unit for this column
for unit in unit_list:
comments = [row[2] for row in new_df.itertuples() if row[1] == unit]
comments = [i for i in comments if len(i) > 0]
###If there were any comments in this unit, add the unit as a subheader
if len(comments) > 0:
doc.add_heading(unit, level=4)
# Bullet list of comments
for c in comments:
doc.add_paragraph(c, style='List Bullet')