将顺序渐进 ID 分配为 pandas 系列更改中的值
Assign sequential progressive ID as value in pandas series changes
我有以下数据框:
date product_code discount
01/01/2022 1 0.7
01/01/2022 2 0.5
02/01/2022 1 0.1
02/01/2022 1 0.1
02/01/2022 2 0.5
03/01/2022 1 0.4
04/01/2022 1 0.1
04/01/2022 2 0.1
05/01/2022 1 0.1
06/01/2022 1 0.1
06/01/2022 1 0.5
...
并且我想在折扣率发生变化时,为每个 'product_code' 和折扣率组合有效地分配一个顺序累进 ID。
因此,得到:
date product_code discount promotion_id
01/01/2022 1 0.7 1
01/01/2022 2 0.5 1
02/01/2022 1 0.1 2
02/01/2022 1 0.1 2
02/01/2022 2 0.5 1
03/01/2022 1 0.4 3
04/01/2022 1 0.1 4
04/01/2022 2 0.1 2
05/01/2022 1 0.1 4
06/01/2022 1 0.1 4
06/01/2022 1 0.5 5
...
为了更好地说明,对于单个产品案例,它将是:
date product_code discount promotion_id
01/01/2022 1 0.7 1
02/01/2022 1 0.1 2
02/01/2022 1 0.1 2
03/01/2022 1 0.4 3
04/01/2022 1 0.1 4
05/01/2022 1 0.1 4
06/01/2022 1 0.1 4
06/01/2022 1 0.5 5
...
我怎样才能做到这一点?
您可以在 groupby
内与 diff
和 cumsum
检查
df['id'] = df.groupby('product_code',sort=False)['discount'].apply(lambda x : x.diff().ne(0).cumsum())
df
Out[644]:
date product_code discount id
0 01/01/2022 1 0.7 1
1 01/01/2022 2 0.5 1
2 02/01/2022 1 0.1 2
3 02/01/2022 1 0.1 2
4 02/01/2022 2 0.5 1
5 03/01/2022 1 0.4 3
6 04/01/2022 1 0.1 4
7 04/01/2022 2 0.1 2
8 05/01/2022 1 0.1 4
9 06/01/2022 1 0.1 4
10 06/01/2022 1 0.5 5
我有以下数据框:
date product_code discount
01/01/2022 1 0.7
01/01/2022 2 0.5
02/01/2022 1 0.1
02/01/2022 1 0.1
02/01/2022 2 0.5
03/01/2022 1 0.4
04/01/2022 1 0.1
04/01/2022 2 0.1
05/01/2022 1 0.1
06/01/2022 1 0.1
06/01/2022 1 0.5
...
并且我想在折扣率发生变化时,为每个 'product_code' 和折扣率组合有效地分配一个顺序累进 ID。
因此,得到:
date product_code discount promotion_id
01/01/2022 1 0.7 1
01/01/2022 2 0.5 1
02/01/2022 1 0.1 2
02/01/2022 1 0.1 2
02/01/2022 2 0.5 1
03/01/2022 1 0.4 3
04/01/2022 1 0.1 4
04/01/2022 2 0.1 2
05/01/2022 1 0.1 4
06/01/2022 1 0.1 4
06/01/2022 1 0.5 5
...
为了更好地说明,对于单个产品案例,它将是:
date product_code discount promotion_id
01/01/2022 1 0.7 1
02/01/2022 1 0.1 2
02/01/2022 1 0.1 2
03/01/2022 1 0.4 3
04/01/2022 1 0.1 4
05/01/2022 1 0.1 4
06/01/2022 1 0.1 4
06/01/2022 1 0.5 5
...
我怎样才能做到这一点?
您可以在 groupby
diff
和 cumsum
检查
df['id'] = df.groupby('product_code',sort=False)['discount'].apply(lambda x : x.diff().ne(0).cumsum())
df
Out[644]:
date product_code discount id
0 01/01/2022 1 0.7 1
1 01/01/2022 2 0.5 1
2 02/01/2022 1 0.1 2
3 02/01/2022 1 0.1 2
4 02/01/2022 2 0.5 1
5 03/01/2022 1 0.4 3
6 04/01/2022 1 0.1 4
7 04/01/2022 2 0.1 2
8 05/01/2022 1 0.1 4
9 06/01/2022 1 0.1 4
10 06/01/2022 1 0.5 5