计算列值每次出现的百分比并按 ID 分组

Calculate the % of each occurrence of a column value and group by an ID

我查看了很多例子,但没有一个例子足够全面。我有以下数据样本:

df = pd.DataFrame({'Teacher ID': [123456,789456,101112,131415],
                  'Q1':[3,2,4,3],
                  'Q2':[3,3,3,3],
                  'Q3':[3,2,3,3],})
Teacher ID Q1 Q2 Q3
123456 3 3 3
789456 2 3 3
123456 3 3 3
131415 4 3 3

我希望每个教师 ID 都知道他对上面每一列(问题)评分为 1 或 2 或 3 或 4 或 5 的次数百分比,并将这些百分比中的每一个添加到数据框。请注意,教师 ID 可以在教师 ID 列中出现多次。

输出应如下所示:

Teacher ID %Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q2 Graded 1
123456 0% 0% 50% 0%
789456 0% 25% 0% 0%
131415 0% 0% 0% 0%

使用 DataFrame.melt for unpivot, then SeriesGroupBy.value_counts for count values per Teacher ID and question in column Q, reshape by Series.unstack,添加所有缺失的成绩(对于每个问题 1 到 5),最后除以原始 DataFrame 的长度:

#get all question by columns names with omit first (Teacher ID)
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = (df.melt('Teacher ID', var_name='Q')
         .groupby(['Teacher ID','Q'])['value']
         .value_counts()
         .unstack([1,2], fill_value=0)
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .mul(100)
         )
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')

crosstab 的替代解决方案:

mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = df.melt('Teacher ID', var_name='Q')

df1 = (pd.crosstab(df1['Teacher ID'], [df1['Q'], df1['value']])
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .mul(100))
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')

print (df1)
            %Q1 Graded 1  %Q1 Graded 2  %Q1 Graded 3  %Q1 Graded 4  \
Teacher ID                                                           
123456               0.0           0.0          50.0           0.0   
131415               0.0           0.0           0.0          25.0   
789456               0.0          25.0           0.0           0.0   

            %Q1 Graded 5  %Q2 Graded 1  %Q2 Graded 2  %Q2 Graded 3  \
Teacher ID                                                           
123456               0.0           0.0           0.0          50.0   
131415               0.0           0.0           0.0          25.0   
789456               0.0           0.0           0.0          25.0   

            %Q2 Graded 4  %Q2 Graded 5  %Q3 Graded 1  %Q3 Graded 2  \
Teacher ID                                                           
123456               0.0           0.0           0.0           0.0   
131415               0.0           0.0           0.0           0.0   
789456               0.0           0.0           0.0          25.0   

            %Q3 Graded 3  %Q3 Graded 4  %Q3 Graded 5  
Teacher ID                                            
123456              50.0           0.0           0.0  
131415              25.0           0.0           0.0  
789456               0.0           0.0           0.0  

对于百分比:

mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])

df1 = (df.melt('Teacher ID', var_name='Q')
         .groupby(['Teacher ID','Q'])['value']
         .value_counts()
         .unstack([1,2], fill_value=0)
         .reindex(mux, axis=1, fill_value=0)
         .div(len(df))
         .applymap("{:.2%}".format)
         )
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
           %Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 %Q1 Graded 5  \
Teacher ID                                                                    
123456            0.00%        0.00%       50.00%        0.00%        0.00%   
131415            0.00%        0.00%        0.00%       25.00%        0.00%   
789456            0.00%       25.00%        0.00%        0.00%        0.00%   

           %Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 %Q2 Graded 4 %Q2 Graded 5  \
Teacher ID                                                                    
123456            0.00%        0.00%       50.00%        0.00%        0.00%   
131415            0.00%        0.00%       25.00%        0.00%        0.00%   
789456            0.00%        0.00%       25.00%        0.00%        0.00%   

           %Q3 Graded 1 %Q3 Graded 2 %Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5  
Teacher ID                                                                   
123456            0.00%        0.00%       50.00%        0.00%        0.00%  
131415            0.00%        0.00%       25.00%        0.00%        0.00%  
789456            0.00%       25.00%        0.00%        0.00%        0.00