在 Pandas 中获取每个分区每列的平均值

Question

我正在尝试为 DataFrame 的每个分区的每列获取一个平均值，例如这个：

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds      2     20
2      UK     Leeds      3     21
3      RO      Cluj      4     24
4      RO      Cluj      5     25
5      RO Bucharest      6     25

也就是说，我想得到 sales 和 stock 的平均值，并将它们聚合成 country 和 city 的独特组合。因此，生成的 DataFrame 应该是：

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds    2.5   20.5
2      RO      Cluj    4.5   24.5
3      RO Bucharest      6     25

我的国家-城市分区的重复行已聚合成一行，具有平均值。

我研究了关于 pandas.DataFrame.mean() 的文档和诸如之类的问题和答案，但是 none 以一种直截了当的方式帮助了我。任何帮助表示赞赏。

Answer 1

groupby

df.groupby(['country', 'city']).mean()

                   sales  stock
country city                   
RO      Bucharest    6.0   25.0
        Cluj         4.5   24.5
UK      Leeds        2.5   20.5
        London       1.0   34.0

设置索引

df.set_index(['country', 'city']).mean(level=[0, 1])

不设置索引

df.groupby(['country', 'city'], as_index=False, sort=False).mean()


  country       city  sales  stock
0      UK     London    1.0   34.0
1      UK      Leeds    2.5   20.5
2      RO       Cluj    4.5   24.5
3      RO  Bucharest    6.0   25.0

在 Pandas 中获取每个分区每列的平均值

Get mean per column per partition in Pandas

python

mean

pandas

partition