在 pandas 中读取 for 循环时如何连接交叉表
How to concatenate crosstabs when reading in a for loop in pandas
我正在使用 python 3.5 中的 Pandas 模块从子目录递归读取交叉表,我想在调用 pd.crosstab() 后在 for 循环内连接交叉表在 for 循环之后将输出写入 excel 文件。在调用 pd.crosstab() 后,我尝试将 table1 复制到 table3(参见下面的代码),但是如果某些值不存在于后面的数据文件中,则 table3 会为这些条目显示 NaN。我查看了 pd.concat,但找不到如何在 for 循环中使用它的示例。
数据文件看起来像(有 100 多个文件和很多列,但这里只显示我感兴趣的列):
First Data File
StudentID Grade
3 A
2 B
1 A
Second Data File
StudentID Grade
1 B
2 A
3 A
Third Data File
StudentID Grade
2 C
1 B
3 A
and so on ....
At the end the output should be like:
Grade A B C
StudentID
1 1 2 0
2 1 1 1
3 3 0 0
我的 python 程序看起来像(从文件顶部删除导入)
.....
fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
temp = pd.read_csv(filename, sep=',', usecols=fields)
table1 = pd.crosstab(temp.StudentID, temp.Grade)
# Note the if condition is executed only once to initlialize table3
if(i==0):
table3 = table1
i = i + 1
table3 = table3 + table1
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()
pd.concat([df1, df2, df3]).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
Grade A B C
StudentID
1 1 2 0
2 1 1 1
3 3 0 0
我尝试翻译你的代码
fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
parse = lambda f: pd.read_csv(f, usecols=fields)
table3 = pd.concat(
[parse(f) for f in glob.glob('C:/script_testing/**/*.txt', recursive=True)]
).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()
我正在使用 python 3.5 中的 Pandas 模块从子目录递归读取交叉表,我想在调用 pd.crosstab() 后在 for 循环内连接交叉表在 for 循环之后将输出写入 excel 文件。在调用 pd.crosstab() 后,我尝试将 table1 复制到 table3(参见下面的代码),但是如果某些值不存在于后面的数据文件中,则 table3 会为这些条目显示 NaN。我查看了 pd.concat,但找不到如何在 for 循环中使用它的示例。
数据文件看起来像(有 100 多个文件和很多列,但这里只显示我感兴趣的列):
First Data File
StudentID Grade
3 A
2 B
1 A
Second Data File
StudentID Grade
1 B
2 A
3 A
Third Data File
StudentID Grade
2 C
1 B
3 A
and so on ....
At the end the output should be like:
Grade A B C
StudentID
1 1 2 0
2 1 1 1
3 3 0 0
我的 python 程序看起来像(从文件顶部删除导入)
.....
fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
temp = pd.read_csv(filename, sep=',', usecols=fields)
table1 = pd.crosstab(temp.StudentID, temp.Grade)
# Note the if condition is executed only once to initlialize table3
if(i==0):
table3 = table1
i = i + 1
table3 = table3 + table1
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()
pd.concat([df1, df2, df3]).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
Grade A B C
StudentID
1 1 2 0
2 1 1 1
3 3 0 0
我尝试翻译你的代码
fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0
parse = lambda f: pd.read_csv(f, usecols=fields)
table3 = pd.concat(
[parse(f) for f in glob.glob('C:/script_testing/**/*.txt', recursive=True)]
).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))
writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()