多列的Groupby

Question

我有

我想按“进程”列的值对其进行分组。如果我使用这个函数：

df_new = df.groupby(['Process'])['Assists'].apply(lambda x: ','.join(x.astype(str))).reset_index()

看起来像这样

我的问题是，我仍然需要“日期”列才能在日期之后进行过滤。流程的“日期”必须相同。因此，将日期也“合并”或“分组”会很好。

我要

非常感谢。

Answer 1

如果您想按 Process 和 Date 分组，请尝试：

df_new = df.groupby(["Process", "Date"], as_index=False)["Assists"].apply(
    lambda x: ",".join(x.astype(str))
)

print(df_new)

打印：

  Process        Date Assists
0   23d34  13.10.2020     0,0
1   23d4t  14.10.2020       1
2   56z45  13.10.2020       3

Answer 2

您可以使用聚合函数：min、max、first、last：

df_new = df.astype({'Assists':'str'})
    .groupby('Process',as_index=False)
    .agg({'Assists':','.join,'Date':'min'}))

Groupby of multiple Columns