Pandas 压缩分组
Pandas Condensing Grouping
我目前正在尝试使用 Pandas 中的 groupby 函数来合并一些 CSV 数据。
这是我目前在 CSV 中的一小部分数据样本:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,1,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,16, For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,9,
Career Education Corporation,Le Cordon Blue College of Culinary Arts,6,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,23,For-Profit
按照目前的情况,同一个 "School" 专栏(Carrington、Devry 等)有很多重复,我想将它们压缩下来。更具体地说,我希望每所独特的学校都有 1 行,该行还对该学校所有实例的数字求和,但保留拥有该学校的公司的名称(第一列)和学校的类型(最后一列)柱子)。
最终产品将如下所示:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,18,For-Profit,
Adtelem Global Education Inc., DeVry Institute of Technology,8,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,25,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,29,For-Profit
我使用了以下代码:
data2 = data.groupby("School").sum()
但是,当我这样做时,我也失去了每所学校附属的公司和类型。我知道解决方案是相当基本的,但我是 Pandas 的新手,所以非常感谢你们能提供的任何帮助!
您可以提供要分组的列列表
data2 = data.groupby(["School", "Company", "Type"]).sum()
我会用 groupby
+ agg
:
df.groupby('School', as_index=False)\
.agg({'Company' : 'first', 'Type' : 'first', 'Number' : 'sum'})
School Company \
0 Carrington Adtelem Global Education Inc.
1 DeVry Institute of Learning Adtelem Global Education Inc.
2 DeVry Institute of Technology Adtelem Global Education Inc.
3 Le Cordon Blue College of Culinary Arts Career Education Corporation
Number Type
0 18 For-Profit
1 25 For-Profit
2 8 For-Profit
3 29 For-Profit
我认为明确聚合所有列会更好。
我目前正在尝试使用 Pandas 中的 groupby 函数来合并一些 CSV 数据。
这是我目前在 CSV 中的一小部分数据样本:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,1,For-Profit
Adtelem Global Education Inc.,Carrington,4,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,Carrington,3,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Technology,4,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,16, For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,9,
Career Education Corporation,Le Cordon Blue College of Culinary Arts,6,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,23,For-Profit
按照目前的情况,同一个 "School" 专栏(Carrington、Devry 等)有很多重复,我想将它们压缩下来。更具体地说,我希望每所独特的学校都有 1 行,该行还对该学校所有实例的数字求和,但保留拥有该学校的公司的名称(第一列)和学校的类型(最后一列)柱子)。
最终产品将如下所示:
Company,School,Number,Type
Adtelem Global Education Inc.,Carrington,18,For-Profit,
Adtelem Global Education Inc., DeVry Institute of Technology,8,For-Profit
Adtelem Global Education Inc.,DeVry Institute of Learning,25,For-Profit
Career Education Corporation,Le Cordon Blue College of Culinary Arts,29,For-Profit
我使用了以下代码:
data2 = data.groupby("School").sum()
但是,当我这样做时,我也失去了每所学校附属的公司和类型。我知道解决方案是相当基本的,但我是 Pandas 的新手,所以非常感谢你们能提供的任何帮助!
您可以提供要分组的列列表
data2 = data.groupby(["School", "Company", "Type"]).sum()
我会用 groupby
+ agg
:
df.groupby('School', as_index=False)\
.agg({'Company' : 'first', 'Type' : 'first', 'Number' : 'sum'})
School Company \
0 Carrington Adtelem Global Education Inc.
1 DeVry Institute of Learning Adtelem Global Education Inc.
2 DeVry Institute of Technology Adtelem Global Education Inc.
3 Le Cordon Blue College of Culinary Arts Career Education Corporation
Number Type
0 18 For-Profit
1 25 For-Profit
2 8 For-Profit
3 29 For-Profit
我认为明确聚合所有列会更好。