基于列分组数据框

group dataframe based on columns

我是数据科学的新手,感谢您的帮助。我的问题是关于根据列对数据框进行分组,以便根据每个主题状态绘制条形图

我的 csv 文件是这样的

Name,Maths,Science,English,sports
S1,Pass,Fail,Pass,Pass
S2,Pass,Pass,NA,Pass
S3,Pass,Fail,Pass,Pass
S4,Pass,Pass,Pass,NA
S5,Pass,Fail,Pass,NA

预计o/p:

Subject,Status,Count
Maths,Pass,5
Science,Pass,2
Science,Fail,3
English,Pass,4
English,NA,1
Sports,Pass,3
Sports,NA,2

PS:展望未来,始终将代码粘贴到您目前尝试过的内容上

要完全匹配您的输出,您可以这样做:

import pandas as pd
df = pd.read_csv('c:/temp/data.csv') # Or where ever your csv file is

subjects = ['Maths', 'Science' , 'English' , 'sports'] # Or you could get that as df.columns and drop 'Name'
grouped_rows = []
for eachsub in subjects:
    rows = df.groupby(eachsub)['Name'].count()
    idx = list(rows.index)
    if 'Pass' in idx:
        grouped_rows.append([eachsub, 'Pass', rows['Pass']])
    if 'Fail' in idx:
        grouped_rows.append([eachsub, 'Fail', rows['Fail']])
new_df = pd.DataFrame(grouped_rows, columns=['Subject', 'Grade', 'Count'])
print(new_df)

我必须建议我避免进入 for 循环。我的方法就是这两行:

subjects = ['Maths', 'Science' , 'English' , 'sports'] 
grouped_rows = df.groupby(eachsub)['Name'].count()

根据您的应用程序,您已经拥有 grouped_rows

中可用的数据

您可以使用 pandas 执行此操作,输出格式与问题中的输出格式不完全相同,但信息肯定相同:

import pandas as pd

# reading csv
df = pd.read_csv("input.csv")

# turning columns into rows
melt_df = pd.melt(df, id_vars=['Name'], value_vars=['Maths', 'Science', "English", "sports"], var_name="Subject", value_name="Status")

# filling NaN values, otherwise the below groupby will ignore them.
melt_df = melt_df.fillna("Unknown")

# counting per group of subject and status.
result_df = melt_df.groupby(["Subject", "Status"]).size().reset_index(name="Count")

然后你得到以下结果:

   Subject   Status  Count
0  English     Pass      4
1  English  Unknown      1
2    Maths     Pass      5
3  Science     Fail      3
4  Science     Pass      2
5   sports     Pass      3
6   sports  Unknown      2