如何查看DF1中的id是否在过去30分钟内出现在DF2中?使用 Pandas

How to check if id in DF1 appeared within the past 30 minutes in DF2? using Pandas

我有 DF1 customer_id、日期时间和水果购买,DF2 customer_id、日期时间和蔬菜购买,如何检查客户在过去 30 分钟内是否购买了水果蔬菜采购?

df1.head()
customer_id puchase_date fruit_item
1 2019-08-01 23:55:55 Apples
2 2019-08-01 23:58:32 Bananas
df2.head()
customer_id puchase_date veggies_item
1 2019-08-01 23:44:55 Eggplants
2 2019-08-01 22:00:32 Carrots
#after writing the required code and adding a new column to df1
df1.head()
customer_id puchase_date fruit_item baught_veggies_last_30_minutes?
1 2019-08-01 23:55:55 Apples Yes
2 2019-08-01 23:58:32 Bananas No

您可以使用 merge_asof。您希望在购买日期后 30 分钟内合并,因此使用 tolerance 参数进行设置。请注意,您将 purchase 拼错为 puchase。我以同样的方式拼写,这样您就可以 运行 没有错误。

out = (pd.merge_asof(df1.assign(puchase_date=pd.to_datetime(df1['puchase_date'])).sort_values(by='puchase_date'),
                    df2.assign(puchase_date=pd.to_datetime(df2['puchase_date'])).sort_values(by='puchase_date'), 
                    on='puchase_date', 
                    by='customer_id', 
                    tolerance=pd.Timedelta('30 minute'))
       .rename(columns={'veggies_item':'bought_veggies_last_30_minutes'})
       .assign(bought_veggies_last_30_minutes=lambda x: x['bought_veggies_last_30_minutes']
               .notna().replace({True: 'Yes', False:'No'})))

输出:

   customer_id        puchase_date fruit_item bought_veggies_last_30_minutes
0            1 2019-08-01 23:55:55     Apples                            Yes
1            2 2019-08-01 23:58:32    Bananas                             No