如何在 Pandas 中找到分组后的百分比?
How to find percentage after group by in Pandas?
我希望它采用一种形式,我可以为每个金额桶获得 Kept %、Broken % 和 F_P2P%。
例子:
期望输出
due_amount_quantiles Kept % Broken % F_P2P Unique User Id
(-0.001, 767.91] 5 10 85 100
(767.91, 1477.935] 10 20 70 100
当前输出
due_amount_quantiles P2P_Status Unique User Id
0 (-0.001, 767.91] Broken 60823
1 (-0.001, 767.91] F_P2P 1324
2 (-0.001, 767.91] Kept 32458
3 (767.91, 1477.935] Broken 68638
4 (767.91, 1477.935] F_P2P 1477
5 (767.91, 1477.935] Kept 24489
6 (1477.935, 2853.447] Broken 69048
7 (1477.935, 2853.447] F_P2P 2106
8 (1477.935, 2853.447] Kept 23450
9 (2853.447, 25175.06] Broken 63811
我该如何实现? unstack 无效。
使用DataFrame.pivot
with divide sum values, if duplicates use DataFrame.pivot_table
:
#if there are unique due_amount_quantiles, P2P_Status tuples
df = df.pivot('due_amount_quantiles','P2P_Status','Unique User Id')
#if there ara duplicated due_amount_quantiles, P2P_Status tuples is necessary aggregate, here by `sum`
df = df.pivot_table(index='due_amount_quantiles',columns='P2P_Status',values='Unique User Id', aggfunc='sum')
df = df.div(df.sum(axis=1), axis=0).add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 NaN NaN 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100
或者:
df = pd.crosstab(index=df['due_amount_quantiles'],
columns=df['P2P_Status'],
values=df['Unique User Id'],
aggfunc='sum',
normalize='index').add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 0.000000 0.000000 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100
我希望它采用一种形式,我可以为每个金额桶获得 Kept %、Broken % 和 F_P2P%。 例子: 期望输出
due_amount_quantiles Kept % Broken % F_P2P Unique User Id
(-0.001, 767.91] 5 10 85 100
(767.91, 1477.935] 10 20 70 100
当前输出
due_amount_quantiles P2P_Status Unique User Id
0 (-0.001, 767.91] Broken 60823
1 (-0.001, 767.91] F_P2P 1324
2 (-0.001, 767.91] Kept 32458
3 (767.91, 1477.935] Broken 68638
4 (767.91, 1477.935] F_P2P 1477
5 (767.91, 1477.935] Kept 24489
6 (1477.935, 2853.447] Broken 69048
7 (1477.935, 2853.447] F_P2P 2106
8 (1477.935, 2853.447] Kept 23450
9 (2853.447, 25175.06] Broken 63811
我该如何实现? unstack 无效。
使用DataFrame.pivot
with divide sum values, if duplicates use DataFrame.pivot_table
:
#if there are unique due_amount_quantiles, P2P_Status tuples
df = df.pivot('due_amount_quantiles','P2P_Status','Unique User Id')
#if there ara duplicated due_amount_quantiles, P2P_Status tuples is necessary aggregate, here by `sum`
df = df.pivot_table(index='due_amount_quantiles',columns='P2P_Status',values='Unique User Id', aggfunc='sum')
df = df.div(df.sum(axis=1), axis=0).add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 NaN NaN 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100
或者:
df = pd.crosstab(index=df['due_amount_quantiles'],
columns=df['P2P_Status'],
values=df['Unique User Id'],
aggfunc='sum',
normalize='index').add_prefix('% ').mul(100).assign(Total = 100)
print (df)
P2P_Status % Broken % F_P2P % Kept Total
due_amount_quantiles
(-0.001, 767.91] 64.291528 1.399503 34.308969 100
(1477.935, 2853.447] 72.986343 2.226122 24.787535 100
(2853.447, 25175.06] 100.000000 0.000000 0.000000 100
(767.91, 1477.935] 72.552958 1.561245 25.885798 100