Pandas 根据类别总和创建百分比
Pandas Create Percentages from Category Sums
我有一个网站流量数据集,一个月内大约有 2000 个网站,按产生流量的设备类型制成表格:
In [12]: df.sample(10)
Out[12]:
date device nb_uniq_visitors site_id
11 2017-10-31 Tv 0.0 3331.0
6 2017-10-22 Car browser 0.0 503.0
7 2017-10-22 Camera 0.0 3259.0
7 2017-10-08 Car browser 0.0 630.0
3 2017-10-23 Camera 0.0 118.0
0 2017-10-12 Desktop 1.0 4769.0
11 2017-10-31 Tv 0.0 361.0
5 2017-10-12 Phablet 0.0 2999.0
9 2017-10-17 Portable media player 0.0 1725.0
0 2017-10-13 Desktop 2410.0 1004.0
4 2017-10-13 all 900.0 1271.0
请注意,device
列的 all
类别代表所有设备的总数,因此它可以作为百分比计算的分母。
我想查看每个网站的设备类型百分比,我想象的输出是这样的(我手动计算了下面的示例):
nb_uniq_visitors
site_id device
74.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.56
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.37
Tablet 0.05
Tv 0.00
Unknown 0.00
all 1.00
96.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.64
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.29
Tablet 0.06
Tv 0.00
Unknown 0.01
all 1.00
我用 groupby
按 site_id
和 device
分组:
In [23]: sl = df.groupby(['site_id', 'device']).sum()
In [24]: sl.head(25)
Out[24]:
nb_uniq_visitors
site_id device
74.0 Camera 0.0
Car browser 0.0
Console 1.0
Desktop 10534.0
Feature phone 0.0
Phablet 178.0
Portable media player 4.0
Smart display 0.0
Smartphone 6955.0
Tablet 1022.0
Tv 1.0
Unknown 62.0
all 18757.0
96.0 Camera 0.0
Car browser 2.0
Console 6.0
Desktop 118157.0
Feature phone 0.0
Phablet 1061.0
Portable media player 73.0
Smart display 0.0
Smartphone 53292.0
Tablet 11060.0
Tv 2.0
Unknown 1717.0
all 185370.0
如何将上面的汇总值转换为百分比?还是有更好的方法?
使用DataFrame.xs
for select all
rows with dividing by DataFrame.div
:
sl = df.groupby(['site_id', 'device']).sum()
a = sl.div(sl.xs('all', level=1))
print (a)
nb_uniq_visitors
site_id device
74.0 Camera 0.000000
Car browser 0.000000
Console 0.000053
Desktop 0.561604
Feature phone 0.000000
Phablet 0.009490
Portable media player 0.000213
Smart display 0.000000
Smartphone 0.370795
Tablet 0.054486
Tv 0.000053
Unknown 0.003305
all 1.000000
96.0 Camera 0.000000
Car browser 0.000011
Console 0.000032
Desktop 0.637412
Feature phone 0.000000
Phablet 0.005724
Portable media player 0.000394
Smart display 0.000000
Smartphone 0.287490
Tablet 0.059664
Tv 0.000011
Unknown 0.009263
all 1.000000
详情:
print (sl.xs('all', level=1))
nb_uniq_visitors
site_id
74.0 18757.0
96.0 185370.0
我有一个网站流量数据集,一个月内大约有 2000 个网站,按产生流量的设备类型制成表格:
In [12]: df.sample(10)
Out[12]:
date device nb_uniq_visitors site_id
11 2017-10-31 Tv 0.0 3331.0
6 2017-10-22 Car browser 0.0 503.0
7 2017-10-22 Camera 0.0 3259.0
7 2017-10-08 Car browser 0.0 630.0
3 2017-10-23 Camera 0.0 118.0
0 2017-10-12 Desktop 1.0 4769.0
11 2017-10-31 Tv 0.0 361.0
5 2017-10-12 Phablet 0.0 2999.0
9 2017-10-17 Portable media player 0.0 1725.0
0 2017-10-13 Desktop 2410.0 1004.0
4 2017-10-13 all 900.0 1271.0
请注意,device
列的 all
类别代表所有设备的总数,因此它可以作为百分比计算的分母。
我想查看每个网站的设备类型百分比,我想象的输出是这样的(我手动计算了下面的示例):
nb_uniq_visitors
site_id device
74.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.56
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.37
Tablet 0.05
Tv 0.00
Unknown 0.00
all 1.00
96.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.64
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.29
Tablet 0.06
Tv 0.00
Unknown 0.01
all 1.00
我用 groupby
按 site_id
和 device
分组:
In [23]: sl = df.groupby(['site_id', 'device']).sum()
In [24]: sl.head(25)
Out[24]:
nb_uniq_visitors
site_id device
74.0 Camera 0.0
Car browser 0.0
Console 1.0
Desktop 10534.0
Feature phone 0.0
Phablet 178.0
Portable media player 4.0
Smart display 0.0
Smartphone 6955.0
Tablet 1022.0
Tv 1.0
Unknown 62.0
all 18757.0
96.0 Camera 0.0
Car browser 2.0
Console 6.0
Desktop 118157.0
Feature phone 0.0
Phablet 1061.0
Portable media player 73.0
Smart display 0.0
Smartphone 53292.0
Tablet 11060.0
Tv 2.0
Unknown 1717.0
all 185370.0
如何将上面的汇总值转换为百分比?还是有更好的方法?
使用DataFrame.xs
for select all
rows with dividing by DataFrame.div
:
sl = df.groupby(['site_id', 'device']).sum()
a = sl.div(sl.xs('all', level=1))
print (a)
nb_uniq_visitors
site_id device
74.0 Camera 0.000000
Car browser 0.000000
Console 0.000053
Desktop 0.561604
Feature phone 0.000000
Phablet 0.009490
Portable media player 0.000213
Smart display 0.000000
Smartphone 0.370795
Tablet 0.054486
Tv 0.000053
Unknown 0.003305
all 1.000000
96.0 Camera 0.000000
Car browser 0.000011
Console 0.000032
Desktop 0.637412
Feature phone 0.000000
Phablet 0.005724
Portable media player 0.000394
Smart display 0.000000
Smartphone 0.287490
Tablet 0.059664
Tv 0.000011
Unknown 0.009263
all 1.000000
详情:
print (sl.xs('all', level=1))
nb_uniq_visitors
site_id
74.0 18757.0
96.0 185370.0