pandas 数据框的累计百分比
Cumulative percentage of pandas data frame
我有一个如下所示的数据框,具有特定的 ID(代码)和区域以及特定距离的长度 (Dist_km)
code Dist_km Shape_Leng Shape_Area
0 M0017 5.0 57516.601608 5.076465e+07
1 M0017 10.0 94037.663673 4.638184e+07
2 M0017 15.0 39106.310470 1.426327e+07
3 M0017 20.0 138.038115 6.464380e+02
4 M0017 30.0 12158.395200 4.102351e+06
5 M0073 5.0 51922.847698 3.375080e+07
6 M0073 10.0 75543.660382 5.966612e+07
7 M0073 15.0 55277.027428 3.423961e+07
8 M0073 20.0 26945.782055 2.584022e+07
9 M0073 25.0 4052.670711 6.904536e+05
10 M0333 5.0 30090.687597 5.468791e+07
11 M0333 10.0 55946.815385 5.768929e+07
12 M0333 15.0 65026.329732 4.008600e+07
13 M0333 20.0 59014.487216 2.994337e+07
14 M0333 25.0 17423.635441 6.358991e+06
使用:
mrb['cum_area_sqm'] = mrb.groupby(['code'])['Shape_Area'].apply(lambda x: x.cumsum())
mrb['cum_area_ha'] = mrb['cum_area_sqm']/10000
mrb_cumsum = mrb.groupby(['code','Dist_km']).agg({'cum_area_ha': 'sum'})
我已经成功地将数据框转换成下面的格式
cum_area_ha
code Dist_km
M0017 5.0 5076.464548
10.0 9714.648238
15.0 11140.974881
20.0 11141.039525
30.0 11551.274623
M0073 5.0 3375.080465
10.0 9341.692680
15.0 12765.654064
20.0 15349.676332
25.0 15418.721691
M0333 5.0 5468.790981
10.0 11237.720454
15.0 15246.320869
20.0 18240.658255
25.0 18876.557351
但是,我现在想获得每个 code
这些区域的累积百分比 Dist_km
,最高可达 100%。
所以,例如对于 M0017,我想要类似下面的内容。
cum_area_ha cum_area_pc
code Dist_km
M0017 5.0 5076.464548 43.49
10.0 9714.648238 84.10
15.0 11140.974881 96.45
20.0 11141.039525 96.45
30.0 11551.274623 100.00
您可以将每个元素除以同一代码组中的最后一个 cum_area_ha。
mrb_cumsum.div(mrb_cumsum.groupby(level=0).last())
Out[97]:
cum_area_ha
code Dist_km
M0017 5.0 0.439472
10.0 0.841002
15.0 0.964480
20.0 0.964486
30.0 1.000000
M0073 5.0 0.218895
10.0 0.605867
15.0 0.827932
20.0 0.995522
25.0 1.000000
M0333 5.0 0.289713
10.0 0.595327
15.0 0.807685
20.0 0.966313
25.0 1.000000
我有一个如下所示的数据框,具有特定的 ID(代码)和区域以及特定距离的长度 (Dist_km)
code Dist_km Shape_Leng Shape_Area
0 M0017 5.0 57516.601608 5.076465e+07
1 M0017 10.0 94037.663673 4.638184e+07
2 M0017 15.0 39106.310470 1.426327e+07
3 M0017 20.0 138.038115 6.464380e+02
4 M0017 30.0 12158.395200 4.102351e+06
5 M0073 5.0 51922.847698 3.375080e+07
6 M0073 10.0 75543.660382 5.966612e+07
7 M0073 15.0 55277.027428 3.423961e+07
8 M0073 20.0 26945.782055 2.584022e+07
9 M0073 25.0 4052.670711 6.904536e+05
10 M0333 5.0 30090.687597 5.468791e+07
11 M0333 10.0 55946.815385 5.768929e+07
12 M0333 15.0 65026.329732 4.008600e+07
13 M0333 20.0 59014.487216 2.994337e+07
14 M0333 25.0 17423.635441 6.358991e+06
使用:
mrb['cum_area_sqm'] = mrb.groupby(['code'])['Shape_Area'].apply(lambda x: x.cumsum())
mrb['cum_area_ha'] = mrb['cum_area_sqm']/10000
mrb_cumsum = mrb.groupby(['code','Dist_km']).agg({'cum_area_ha': 'sum'})
我已经成功地将数据框转换成下面的格式
cum_area_ha
code Dist_km
M0017 5.0 5076.464548
10.0 9714.648238
15.0 11140.974881
20.0 11141.039525
30.0 11551.274623
M0073 5.0 3375.080465
10.0 9341.692680
15.0 12765.654064
20.0 15349.676332
25.0 15418.721691
M0333 5.0 5468.790981
10.0 11237.720454
15.0 15246.320869
20.0 18240.658255
25.0 18876.557351
但是,我现在想获得每个 code
这些区域的累积百分比 Dist_km
,最高可达 100%。
所以,例如对于 M0017,我想要类似下面的内容。
cum_area_ha cum_area_pc
code Dist_km
M0017 5.0 5076.464548 43.49
10.0 9714.648238 84.10
15.0 11140.974881 96.45
20.0 11141.039525 96.45
30.0 11551.274623 100.00
您可以将每个元素除以同一代码组中的最后一个 cum_area_ha。
mrb_cumsum.div(mrb_cumsum.groupby(level=0).last())
Out[97]:
cum_area_ha
code Dist_km
M0017 5.0 0.439472
10.0 0.841002
15.0 0.964480
20.0 0.964486
30.0 1.000000
M0073 5.0 0.218895
10.0 0.605867
15.0 0.827932
20.0 0.995522
25.0 1.000000
M0333 5.0 0.289713
10.0 0.595327
15.0 0.807685
20.0 0.966313
25.0 1.000000