Pandas 具有 NaN 值的两个数据帧的 VLOOKUP

Pandas VLOOKUP for two dataframes with NaN values

我有以下数据框df1

    name           mobile_no      
0   Hector ABC       123       
1   Hector ABC       287        
2   Jose JKD         567      
3   Luis AH          NaN      
4   Billy DH         NaN 
5   Harry AC         569

 

还有另一个数据框df2

    download_date  mobile_no      
0   2021-05-30       123        
1   2020-09-28       287      
2   2021-02-11       789        
3   2021-10-06       321        
4   2020-01-15       569      

如果 df1 手机号码匹配,我想 return 从 df2 下载日期。以某种方式执行 pd.merge 会使 df1 的行数加倍。有没有办法逐行检查 return download_date? 我不能在 df1 中删除重复项(如果有的话)并且 df1 中有更多的列。我有点希望它像一个 excel VLOOKUP,它将 return 通过简单地匹配查找值来为该行选择列的结果。我试过类似的东西:

df1['download_date'] = np.where(df1.mobile_no == df2.mobile_no, df2.download_date, np.nan)

想要的结果:

    name         mobile_no    download_date
0   Hector ABC      123        2021-05-30
1   John DYC        237        2020-09-28
2   Jose JKD        567           NaN
3   Luis AH         NaN           NaN
4   Billy DH        NaN           NaN
5   Harry AC        569        2020-01-15

mergepd.concat

m = df1.mobile_no.isna()
merged_df = pd.concat([df1.loc[m], df1.loc[~m].merge(df2, on='mobile_no', how ='left')]).sort_index()

输出

         name  mobile_no download_date
0  Hector ABC        123    2021-05-30
1  Hector ABC        287    2020-09-28
2    Jose JKD        567           NaN
3     Luis AH       <NA>           NaN
3    Harry AC        569    2020-01-15
4    Billy DH       <NA>           NaN

你要找的是Series.map:

df["download_date"] = df["mobile_no"].map(df2.set_index("mobile_no")["download_date"])
print (df)

         name  mobile_no download_date
0 Hector  ABC      123.0    2021-05-30
1 Hector  ABC      287.0    2020-09-28
2 Jose    JKD      567.0           NaN
3 Luis     AH        NaN           NaN
4 Billy    DH        NaN           NaN
5 Harry    AC      569.0    2020-01-15