根据另一列特定值的每次出现在 pandas 中添加索引

Question

我有一个这样的数据框：

category name   age 
parent  harry   29
child   smith   12
parent  sally   41
child   david   19
child   mike    16

我想根据每次出现的类别列值 'parent'（数据框按顺序）添加一列以对系列进行分组。如：

category name   age  family_id
parent  harry   29     0
child   smith   12     0
parent  sally   41     1
child   david   19     1
child   mike    16     1

我正在尝试使 family_id 成为递增整数。

我已经尝试了很多 group_by 并且目前正在尝试编写我自己的应用函数但是它非常慢并且没有按预期工作。我一直无法找到一个示例，该示例根据列值对 same value 的每次出现进行分组.

Answer 1

你可以用eq to match if category column equals parent and cumsum , sub减1因为这里cumsum是从1开始的:

df['family_id'] = df['category'].eq('parent').cumsum().sub(1)
print(df)

  category   name  age  family_id
0   parent  harry   29          0
1    child  smith   12          0
2   parent  sally   41          1
3    child  david   19          1
4    child   mike   16          1

根据另一列特定值的每次出现在 pandas 中添加索引

Add index in pandas based on each occurance of another column specific value

python

pandas

pandasql

pandas-groupby