pandas groupby 计算字符串在列中的出现次数

Question

我想计算分组的 pandas 数据框列中字符串的出现次数。

假设我有以下数据框：

catA    catB    scores
A       X       6-4 RET
A       X       6-4 6-4
A       Y       6-3 RET
B       Z       6-0 RET
B       Z       6-1 RET

首先，我想按 catA 和 catB 分组。对于这些组中的每一个，我想计算 scores 列中 RET 的出现次数。

结果应如下所示：

catA    catB    RET
A       X       1
A       Y       1
B       Z       2

按两列分组很容易：grouped = df.groupby(['catA', 'catB'])

但下一步是什么？

Answer 1

呼叫apply on the 'scores' column on the groupby object and use the vectorise str method contains, use this to filter the group and call count:

In [34]:    
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())

Out[34]:
catA  catB
A     X       1
      Y       1
B     Z       2
Name: scores, dtype: int64

要分配为列，请使用 transform 以便聚合 returns 一个序列，其索引与原始 df:

对齐

In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df

Out[35]:
  catA catB   scores count
0    A    X  6-4 RET     1
1    A    X  6-4 6-4     1
2    A    Y  6-3 RET     1
3    B    Z  6-0 RET     2
4    B    Z  6-1 RET     2

pandas groupby 计算字符串在列中的出现次数

pandas groupby count string occurrence over column

python

group-by

count

dataframe

pandas