Pandas 压缩分组

Pandas Condensing Grouping

我目前正在尝试使用 Pandas 中的 groupby 函数来合并一些 CSV 数据。

这是我目前在 CSV 中的一小部分数据样本:

Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,1,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,16,   For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,9,    
Career Education Corporation,Le Cordon Blue College of Culinary Arts,6,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,23,For-Profit

按照目前的情况,同一个 "School" 专栏(Carrington、Devry 等)有很多重复,我想将它们压缩下来。更具体地说,我希望每所独特的学校都有 1 行,该行还对该学校所有实例的数字求和,但保留拥有该学校的公司的名称(第一列)和学校的类型(最后一列)柱子)。

最终产品将如下所示:

Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,18,For-Profit,
Adtelem Global Education Inc., DeVry Institute of Technology,8,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,25,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,29,For-Profit

我使用了以下代码:

data2 = data.groupby("School").sum()

但是,当我这样做时,我也失去了每所学校附属的公司和类型。我知道解决方案是相当基本的,但我是 Pandas 的新手,所以非常感谢你们能提供的任何帮助!

您可以提供要分组的列列表

data2 = data.groupby(["School", "Company", "Type"]).sum()

我会用 groupby + agg:

df.groupby('School', as_index=False)\
    .agg({'Company' : 'first', 'Type' : 'first', 'Number' : 'sum'})

                                    School                        Company  \
0                               Carrington  Adtelem Global Education Inc.   
1              DeVry Institute of Learning  Adtelem Global Education Inc.   
2            DeVry Institute of Technology  Adtelem Global Education Inc.   
3  Le Cordon Blue College of Culinary Arts   Career Education Corporation   

   Number        Type  
0      18  For-Profit  
1      25  For-Profit  
2       8  For-Profit  
3      29  For-Profit 

我认为明确聚合所有列会更好。