每个实例将 groupby 结果插入新列一次

Question

DataFrame 如此：

new
Color  Value
0   Red    100
1   Red    150
2  Blue     50

我将重复项的计数插入到新系列中：

new['Repeats'] = new.groupby(['Color'])[new.columns[-1]].transform('count')

这导致：

Color  Value  Repeats
0   Red    100        2
1   Red    150        2
2  Blue     50        1

有没有办法获得相同的结果，但每个实例只输入一次 'Repeats'，如下所示：

Color  Value  Repeats
0   Red    100        2
1   Red    150        
2  Blue     50        1

这对我来说似乎很愚蠢，但一位客户提出了这样的要求。

提前感谢您的帮助。

Answer 1

执行 transform 后，使用 loc and duplicated 使重复项为空字符串：

new.loc[new['Color'].duplicated(), 'Repeats'] = ''

结果输出：

  Color  Value Repeats
0   Red    100       2
1   Red    150        
2  Blue     50       1

请注意，您也可以将重复项指定为 np.nan，但您需要先将 'Repeats' 列转换为字符串 dtype，否则计数将变为浮点数。

Insert groupby result into new column ONCE per instance