在 pandas 中读取 for 循环时如何连接交叉表

How to concatenate crosstabs when reading in a for loop in pandas

我正在使用 python 3.5 中的 Pandas 模块从子目录递归读取交叉表,我想在调用 pd.crosstab() 后在 for 循环内连接交叉表在 for 循环之后将输出写入 excel 文件。在调用 pd.crosstab() 后,我尝试将 table1 复制到 table3(参见下面的代码),但是如果某些值不存在于后面的数据文件中,则 table3 会为这些条目显示 NaN。我查看了 pd.concat,但找不到如何在 for 循环中使用它的示例。

数据文件看起来像(有 100 多个文件和很多列,但这里只显示我感兴趣的列):

    First Data File
    StudentID    Grade      
    3            A
    2            B
    1            A

    Second Data File
    StudentID   Grade
    1            B
    2            A
    3            A

    Third Data File
    StudentID   Grade
    2            C
    1            B
    3            A

    and so on ....
    At the end the output should be like:

    Grade       A   B   C
    StudentID
    1           1   2   0
    2           1   1   1
    3           3   0   0   

我的 python 程序看起来像(从文件顶部删除导入)

.....

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0

for filename in glob.glob('C:/script_testing/**/*.txt', recursive=True):
    temp = pd.read_csv(filename, sep=',', usecols=fields)
    table1 = pd.crosstab(temp.StudentID, temp.Grade)
    # Note the if condition is executed only once to initlialize table3
    if(i==0):
        table3 = table1
        i = i + 1
    table3 = table3 + table1

writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()
pd.concat([df1, df2, df3]).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))

Grade      A  B  C
StudentID         
1          1  2  0
2          1  1  1
3          3  0  0

我尝试翻译你的代码

fields = ['StudentID', 'Grade']
path= 'C:/script_testing/'
i=0

parse = lambda f: pd.read_csv(f, usecols=fields)
table3 = pd.concat(
    [parse(f) for f in glob.glob('C:/script_testing/**/*.txt', recursive=True)]
).pipe(lambda d: pd.crosstab(d.StudentID, d.Grade))

writer = pd.ExcelWriter('Report.xlsx', engine='xlsxwriter')
table3.to_excel(writer, sheet_name='StudentID_vs_Grade')
writer.save()