计算列值每次出现的百分比并按 ID 分组
Calculate the % of each occurrence of a column value and group by an ID
我查看了很多例子,但没有一个例子足够全面。我有以下数据样本:
df = pd.DataFrame({'Teacher ID': [123456,789456,101112,131415],
'Q1':[3,2,4,3],
'Q2':[3,3,3,3],
'Q3':[3,2,3,3],})
Teacher ID
Q1
Q2
Q3
123456
3
3
3
789456
2
3
3
123456
3
3
3
131415
4
3
3
我希望每个教师 ID 都知道他对上面每一列(问题)评分为 1 或 2 或 3 或 4 或 5 的次数百分比,并将这些百分比中的每一个添加到数据框。请注意,教师 ID 可以在教师 ID 列中出现多次。
输出应如下所示:
Teacher ID
%Q1 Graded 1
%Q1 Graded 2
%Q1 Graded 3
%Q2 Graded 1
123456
0%
0%
50%
0%
789456
0%
25%
0%
0%
131415
0%
0%
0%
0%
使用 DataFrame.melt
for unpivot, then SeriesGroupBy.value_counts
for count values per Teacher ID
and question in column Q
, reshape by Series.unstack
,添加所有缺失的成绩(对于每个问题 1 到 5),最后除以原始 DataFrame 的长度:
#get all question by columns names with omit first (Teacher ID)
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
crosstab
的替代解决方案:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = df.melt('Teacher ID', var_name='Q')
df1 = (pd.crosstab(df1['Teacher ID'], [df1['Q'], df1['value']])
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100))
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 \
Teacher ID
123456 0.0 0.0 50.0 0.0
131415 0.0 0.0 0.0 25.0
789456 0.0 25.0 0.0 0.0
%Q1 Graded 5 %Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 \
Teacher ID
123456 0.0 0.0 0.0 50.0
131415 0.0 0.0 0.0 25.0
789456 0.0 0.0 0.0 25.0
%Q2 Graded 4 %Q2 Graded 5 %Q3 Graded 1 %Q3 Graded 2 \
Teacher ID
123456 0.0 0.0 0.0 0.0
131415 0.0 0.0 0.0 0.0
789456 0.0 0.0 0.0 25.0
%Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 50.0 0.0 0.0
131415 25.0 0.0 0.0
789456 0.0 0.0 0.0
对于百分比:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.applymap("{:.2%}".format)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 %Q1 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 0.00% 25.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00%
%Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 %Q2 Graded 4 %Q2 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 0.00% 25.00% 0.00% 0.00%
%Q3 Graded 1 %Q3 Graded 2 %Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00
我查看了很多例子,但没有一个例子足够全面。我有以下数据样本:
df = pd.DataFrame({'Teacher ID': [123456,789456,101112,131415],
'Q1':[3,2,4,3],
'Q2':[3,3,3,3],
'Q3':[3,2,3,3],})
Teacher ID | Q1 | Q2 | Q3 |
---|---|---|---|
123456 | 3 | 3 | 3 |
789456 | 2 | 3 | 3 |
123456 | 3 | 3 | 3 |
131415 | 4 | 3 | 3 |
我希望每个教师 ID 都知道他对上面每一列(问题)评分为 1 或 2 或 3 或 4 或 5 的次数百分比,并将这些百分比中的每一个添加到数据框。请注意,教师 ID 可以在教师 ID 列中出现多次。
输出应如下所示:
Teacher ID | %Q1 Graded 1 | %Q1 Graded 2 | %Q1 Graded 3 | %Q2 Graded 1 |
---|---|---|---|---|
123456 | 0% | 0% | 50% | 0% |
789456 | 0% | 25% | 0% | 0% |
131415 | 0% | 0% | 0% | 0% |
使用 DataFrame.melt
for unpivot, then SeriesGroupBy.value_counts
for count values per Teacher ID
and question in column Q
, reshape by Series.unstack
,添加所有缺失的成绩(对于每个问题 1 到 5),最后除以原始 DataFrame 的长度:
#get all question by columns names with omit first (Teacher ID)
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
crosstab
的替代解决方案:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = df.melt('Teacher ID', var_name='Q')
df1 = (pd.crosstab(df1['Teacher ID'], [df1['Q'], df1['value']])
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.mul(100))
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 \
Teacher ID
123456 0.0 0.0 50.0 0.0
131415 0.0 0.0 0.0 25.0
789456 0.0 25.0 0.0 0.0
%Q1 Graded 5 %Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 \
Teacher ID
123456 0.0 0.0 0.0 50.0
131415 0.0 0.0 0.0 25.0
789456 0.0 0.0 0.0 25.0
%Q2 Graded 4 %Q2 Graded 5 %Q3 Graded 1 %Q3 Graded 2 \
Teacher ID
123456 0.0 0.0 0.0 0.0
131415 0.0 0.0 0.0 0.0
789456 0.0 0.0 0.0 25.0
%Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 50.0 0.0 0.0
131415 25.0 0.0 0.0
789456 0.0 0.0 0.0
对于百分比:
mux = pd.MultiIndex.from_product([df.columns[1:], range(1, 6)])
df1 = (df.melt('Teacher ID', var_name='Q')
.groupby(['Teacher ID','Q'])['value']
.value_counts()
.unstack([1,2], fill_value=0)
.reindex(mux, axis=1, fill_value=0)
.div(len(df))
.applymap("{:.2%}".format)
)
df1.columns = df1.columns.map(lambda x: f'%{x[0]} Graded {x[1]}')
print (df1)
%Q1 Graded 1 %Q1 Graded 2 %Q1 Graded 3 %Q1 Graded 4 %Q1 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 0.00% 25.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00%
%Q2 Graded 1 %Q2 Graded 2 %Q2 Graded 3 %Q2 Graded 4 %Q2 Graded 5 \
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 0.00% 25.00% 0.00% 0.00%
%Q3 Graded 1 %Q3 Graded 2 %Q3 Graded 3 %Q3 Graded 4 %Q3 Graded 5
Teacher ID
123456 0.00% 0.00% 50.00% 0.00% 0.00%
131415 0.00% 0.00% 25.00% 0.00% 0.00%
789456 0.00% 25.00% 0.00% 0.00% 0.00