当 pandas 数据框中的不同列之间满足条件时显示带有 indicator/flag 的列(不合并)
Display a column with an indicator/flag when conditions are met between different columns in pandas dataframe (no-merge)
大家好。我有一个数据框,其中包含区域、客户和一些交付,以及它们的价格。此列用作购买类型,第一次和最后一次购买标记为 'first' 和 'last',有时我们将中间交货标记为“交货”。作为所需输出中的列。必须显示全部数据。
我已经使用 merge 解决了这个问题,但我 想知道是否有不使用 merge 的方法来解决这个问题,因为它看起来效率不高。
谢谢你的时间。
示例数据:
import pandas as pd
data = [['NY', 'A','FIRST', 25], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 15],
['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price'])
# print dataframe.
print(df)
region customer purchaseType price
0 NY A FIRST 25
1 NY A DELIVERY 20
2 NY A DELIVERY 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY 10
9 FL A DELIVERY 12
10 FL A DELIVERY 25
11 FL A LAST 15
12 FL C FIRST 15
13 FL C LAST 10
14 FL D FIRST 10
15 FL D DELIVERY 20
16 FL D LAST 30
17 FL E FIRST 20
18 FL E LAST 20
预期输出:
region customer purchaseType price firstLastEqual
0 NY A FIRST 25 True
1 NY A DELIVERY 20 True
2 NY A DELIVERY 30 True
3 NY A LAST 25 True
4 NY B FIRST 15 False
5 NY B DELIVERY 10 False
6 NY B LAST 20 False
7 FL A FIRST 15 True
8 FL A DELIVERY 10 True
9 FL A DELIVERY 12 True
10 FL A DELIVERY 25 True
11 FL A LAST 15 True
12 FL C FIRST 15 False
13 FL C LAST 10 False
14 FL D FIRST 10 False
15 FL D DELIVERY 20 False
16 FL D LAST 30 False
17 FL E FIRST 20 True
18 FL E LAST 20 True
使用'merge'回答:
df_first = df[df['purchaseType'] == 'FIRST']
df_last = df[df['purchaseType'] == 'LAST']
df_compare = df_first.merge(df_last, how='inner', left_on=['region','customer'], right_on=['region','customer'])
df_compare = df_compare[df_compare['price_x'] == df_compare['price_y']]
df_compare['firstLastEqual'] = True
df = df.merge(df_compare, how='left', left_on=['region','customer'], right_on=['region','customer'])
df['firstLastEqual'] = df['firstLastEqual'].fillna(False)
df = df.drop(['purchaseType_x', 'price_x', 'purchaseType_y', 'price_y'], axis=1)
print(df)
想知道是否可以不合并。
大家好。我有一个数据框,其中包含区域、客户和一些交付,以及它们的价格。此列用作购买类型,第一次和最后一次购买标记为 'first' 和 'last',有时我们将中间交货标记为“交货”。作为所需输出中的列。必须显示全部数据。
我已经使用 merge 解决了这个问题,但我 想知道是否有不使用 merge 的方法来解决这个问题,因为它看起来效率不高。 谢谢你的时间。
示例数据:
import pandas as pd
data = [['NY', 'A','FIRST', 25], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['FL', 'A','DELIVERY', 12], ['FL', 'A','DELIVERY', 25], ['FL', 'A','LAST', 15],
['FL', 'C','FIRST', 15], ['FL', 'C','LAST', 10],
['FL', 'D','FIRST', 10], ['FL', 'D','DELIVERY', 20], ['FL', 'D','LAST', 30],
['FL', 'E','FIRST', 20], ['FL', 'E','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['region', 'customer', 'purchaseType', 'price'])
# print dataframe.
print(df)
region customer purchaseType price
0 NY A FIRST 25
1 NY A DELIVERY 20
2 NY A DELIVERY 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY 10
9 FL A DELIVERY 12
10 FL A DELIVERY 25
11 FL A LAST 15
12 FL C FIRST 15
13 FL C LAST 10
14 FL D FIRST 10
15 FL D DELIVERY 20
16 FL D LAST 30
17 FL E FIRST 20
18 FL E LAST 20
预期输出:
region customer purchaseType price firstLastEqual
0 NY A FIRST 25 True
1 NY A DELIVERY 20 True
2 NY A DELIVERY 30 True
3 NY A LAST 25 True
4 NY B FIRST 15 False
5 NY B DELIVERY 10 False
6 NY B LAST 20 False
7 FL A FIRST 15 True
8 FL A DELIVERY 10 True
9 FL A DELIVERY 12 True
10 FL A DELIVERY 25 True
11 FL A LAST 15 True
12 FL C FIRST 15 False
13 FL C LAST 10 False
14 FL D FIRST 10 False
15 FL D DELIVERY 20 False
16 FL D LAST 30 False
17 FL E FIRST 20 True
18 FL E LAST 20 True
使用'merge'回答:
df_first = df[df['purchaseType'] == 'FIRST']
df_last = df[df['purchaseType'] == 'LAST']
df_compare = df_first.merge(df_last, how='inner', left_on=['region','customer'], right_on=['region','customer'])
df_compare = df_compare[df_compare['price_x'] == df_compare['price_y']]
df_compare['firstLastEqual'] = True
df = df.merge(df_compare, how='left', left_on=['region','customer'], right_on=['region','customer'])
df['firstLastEqual'] = df['firstLastEqual'].fillna(False)
df = df.drop(['purchaseType_x', 'price_x', 'purchaseType_y', 'price_y'], axis=1)
print(df)
想知道是否可以不合并。