当 pandas 数据框中的不同列之间满足条件时显示带有 indicator/flag 的列(不合并)

Display a column with an indicator/flag when conditions are met between different columns in pandas dataframe (no-merge)

大家好。我有一个数据框,其中包含区域、客户和一些交付,以及它们的价格。此列用作购买类型,第一次和最后一次购买标记为 'first' 和 'last',有时我们将中间交货标记为“交货”。作为所需输出中的列。必须显示全部数据。

我已经使用 merge 解决了这个问题,但我 想知道是否有不使用 merge 的方法来解决这个问题,因为它看起来效率不高。 谢谢你的时间。

示例数据:


import pandas as pd  
data = [['NY', 'A','FIRST', 25], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
       ['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
       ['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 15],
       ['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
       ['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
       ['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
       ] 
  
# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price']) 
  
# print dataframe. 
print(df)

   region customer purchaseType  price
0      NY        A        FIRST     25
1      NY        A     DELIVERY     20
2      NY        A     DELIVERY     30
3      NY        A         LAST     25
4      NY        B        FIRST     15
5      NY        B     DELIVERY     10
6      NY        B         LAST     20
7      FL        A        FIRST     15
8      FL        A     DELIVERY     10
9      FL        A     DELIVERY     12
10     FL        A     DELIVERY     25
11     FL        A         LAST     15
12     FL        C        FIRST     15
13     FL        C         LAST     10
14     FL        D        FIRST     10
15     FL        D     DELIVERY     20
16     FL        D         LAST     30
17     FL        E        FIRST     20
18     FL        E         LAST     20

预期输出:

   region customer purchaseType  price  firstLastEqual
0      NY        A        FIRST     25            True
1      NY        A     DELIVERY     20            True
2      NY        A     DELIVERY     30            True
3      NY        A         LAST     25            True
4      NY        B        FIRST     15           False
5      NY        B     DELIVERY     10           False
6      NY        B         LAST     20           False
7      FL        A        FIRST     15            True
8      FL        A     DELIVERY     10            True
9      FL        A     DELIVERY     12            True
10     FL        A     DELIVERY     25            True
11     FL        A         LAST     15            True
12     FL        C        FIRST     15           False
13     FL        C         LAST     10           False
14     FL        D        FIRST     10           False
15     FL        D     DELIVERY     20           False
16     FL        D         LAST     30           False
17     FL        E        FIRST     20            True
18     FL        E         LAST     20            True

使用'merge'回答:

df_first = df[df['purchaseType'] == 'FIRST']
df_last = df[df['purchaseType'] == 'LAST']
df_compare = df_first.merge(df_last, how='inner', left_on=['region','customer'], right_on=['region','customer'])
df_compare = df_compare[df_compare['price_x'] == df_compare['price_y']]
df_compare['firstLastEqual'] = True
df = df.merge(df_compare, how='left', left_on=['region','customer'], right_on=['region','customer'])
df['firstLastEqual'] = df['firstLastEqual'].fillna(False)
df = df.drop(['purchaseType_x', 'price_x', 'purchaseType_y', 'price_y'], axis=1)
print(df)

想知道是否可以不合并。