pandas groupby 将字符串值与前一行值进行比较,并在新列中发现变化

pandas groupby comparing string value with previous row value and spot changes in new columns

我有这个演示 df:

info = {'customer': ['Jason', 'Jason', 'Jason', 'Jason',
                     'Molly', 'Molly', 'Molly', 'Molly'], 
'Good': ['Cookie', 'Cookie', 'Cookie', 'Cookie','Ice Cream', 
         'Ice Cream', 'Ice Cream', 'Ice Cream'],
'Date' :['2021-12-14','2022-01-04','2022-01-11','2022-01-18',
         '2022-01-12','2022-01-15','2022-01-19','2022-01-30'],
'Flavor' :['Chocolate','Vanilla','Vanilla','Strawberry',
           'Chocolate', 'Vanilla', 'Caramel', 'Caramel']}
df = pd.DataFrame(data=info)
df

给出:

   customer   Good      Date        Flavor
0   Jason   Cookie      2021-12-14  Chocolate
1   Jason   Cookie      2022-01-04  Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

我正在尝试在新列 From - To 中跟踪每个客户每个商品的口味变化。我做了分组部分:

   df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()

我得到了:

 customer  Good       Date      
    Jason     Cookie     2021-12-14     Chocolate
                         2022-01-04       Vanilla
                         2022-01-11       Vanilla
                         2022-01-18    Strawberry
    Molly     Ice Cream  2022-01-12     Chocolate
                         2022-01-15       Vanilla
                         2022-01-19       Caramel
                         2022-01-30       Caramel
    Name: Flavor, dtype: object

每组的第一行是入口点然后我想比较每组的下一个变化,如果不同则我们跟踪新列的变化(从 & 到) 如果相似的值没有任何反应。

我尝试了多种方法和代码,但不幸的是我不知道最好的方法。

考虑到 reset_index() 的预期输出:

  customer   Good      Date        Flavor           From         To
0   Jason   Cookie      2021-12-14  Chocolate    
1   Jason   Cookie      2022-01-04  Vanilla         Chocolate    Vanilla
2   Jason   Cookie      2022-01-11  Vanilla
3   Jason   Cookie      2022-01-18  Strawberry      Vanilla      Strawberry
4   Molly   Ice Cream   2022-01-12  Chocolate
5   Molly   Ice Cream   2022-01-15  Vanilla         Chocolate    Vanilla
6   Molly   Ice Cream   2022-01-19  Caramel         Vanilla      Caramel
7   Molly   Ice Cream   2022-01-30  Caramel

在您创建的 sum(名为 g)的基础上,我们可以 groupby 索引的前 2 级和 shift 它,然后 join 它回到 g。在 rename-ing 列之后,mask“To”和“From”列取决于是否有任何更改或是否为 NaN。最后,join 这回到 DataFrame:

g = df.sort_values(['Date']).groupby(['customer','Good','Date'])['Flavor'].sum()
joined = g.to_frame().assign(To=g).join(g.groupby(level=[0,1]).shift().to_frame(), lsuffix='', rsuffix='_').rename(columns={'Flavor_':'From'})
joined.update(joined[['To','From']].mask(joined['From'].isna() | joined['From'].eq(joined['To']), ''))
out = joined[['Flavor','From','To']].reset_index()

输出:

  customer       Good        Date      Flavor       From          To
0    Jason     Cookie  2021-12-14   Chocolate                       
1    Jason     Cookie  2022-01-04     Vanilla  Chocolate     Vanilla
2    Jason     Cookie  2022-01-11     Vanilla                       
3    Jason     Cookie  2022-01-18  Strawberry    Vanilla  Strawberry
4    Molly  Ice Cream  2022-01-12   Chocolate                       
5    Molly  Ice Cream  2022-01-15     Vanilla  Chocolate     Vanilla
6    Molly  Ice Cream  2022-01-19     Caramel    Vanilla     Caramel
7    Molly  Ice Cream  2022-01-30     Caramel                       
s=df.assign(
             
             
             From = df.sort_values(by='Date').groupby(['customer',  'Good'])['Flavor'].apply(lambda x: x.shift(1)),
             To = df['Flavor']
).dropna()

out = df.join(s[s['From'] != s['To']].iloc[:,-2:]).fillna('')




   customer       Good        Date      Flavor       From          To
0    Jason     Cookie  2021-12-14   Chocolate                       
1    Jason     Cookie  2022-01-04     Vanilla  Chocolate     Vanilla
2    Jason     Cookie  2022-01-11     Vanilla                       
3    Jason     Cookie  2022-01-18  Strawberry    Vanilla  Strawberry
4    Molly  Ice Cream  2022-01-12   Chocolate                       
5    Molly  Ice Cream  2022-01-15     Vanilla  Chocolate     Vanilla
6    Molly  Ice Cream  2022-01-19     Caramel    Vanilla     Caramel
7    Molly  Ice Cream  2022-01-30     Caramel