Pandas 具有 NaN 值的两个数据帧的 VLOOKUP
Pandas VLOOKUP for two dataframes with NaN values
我有以下数据框df1
name mobile_no
0 Hector ABC 123
1 Hector ABC 287
2 Jose JKD 567
3 Luis AH NaN
4 Billy DH NaN
5 Harry AC 569
还有另一个数据框df2
download_date mobile_no
0 2021-05-30 123
1 2020-09-28 287
2 2021-02-11 789
3 2021-10-06 321
4 2020-01-15 569
如果 df1 手机号码匹配,我想 return 从 df2 下载日期。以某种方式执行 pd.merge 会使 df1 的行数加倍。有没有办法逐行检查 return download_date?
我不能在 df1 中删除重复项(如果有的话)并且 df1 中有更多的列。我有点希望它像一个 excel VLOOKUP,它将 return 通过简单地匹配查找值来为该行选择列的结果。我试过类似的东西:
df1['download_date'] = np.where(df1.mobile_no == df2.mobile_no, df2.download_date, np.nan)
想要的结果:
name mobile_no download_date
0 Hector ABC 123 2021-05-30
1 John DYC 237 2020-09-28
2 Jose JKD 567 NaN
3 Luis AH NaN NaN
4 Billy DH NaN NaN
5 Harry AC 569 2020-01-15
merge
与 pd.concat
m = df1.mobile_no.isna()
merged_df = pd.concat([df1.loc[m], df1.loc[~m].merge(df2, on='mobile_no', how ='left')]).sort_index()
输出
name mobile_no download_date
0 Hector ABC 123 2021-05-30
1 Hector ABC 287 2020-09-28
2 Jose JKD 567 NaN
3 Luis AH <NA> NaN
3 Harry AC 569 2020-01-15
4 Billy DH <NA> NaN
你要找的是Series.map
:
df["download_date"] = df["mobile_no"].map(df2.set_index("mobile_no")["download_date"])
print (df)
name mobile_no download_date
0 Hector ABC 123.0 2021-05-30
1 Hector ABC 287.0 2020-09-28
2 Jose JKD 567.0 NaN
3 Luis AH NaN NaN
4 Billy DH NaN NaN
5 Harry AC 569.0 2020-01-15
我有以下数据框df1
name mobile_no
0 Hector ABC 123
1 Hector ABC 287
2 Jose JKD 567
3 Luis AH NaN
4 Billy DH NaN
5 Harry AC 569
还有另一个数据框df2
download_date mobile_no
0 2021-05-30 123
1 2020-09-28 287
2 2021-02-11 789
3 2021-10-06 321
4 2020-01-15 569
如果 df1 手机号码匹配,我想 return 从 df2 下载日期。以某种方式执行 pd.merge 会使 df1 的行数加倍。有没有办法逐行检查 return download_date? 我不能在 df1 中删除重复项(如果有的话)并且 df1 中有更多的列。我有点希望它像一个 excel VLOOKUP,它将 return 通过简单地匹配查找值来为该行选择列的结果。我试过类似的东西:
df1['download_date'] = np.where(df1.mobile_no == df2.mobile_no, df2.download_date, np.nan)
想要的结果:
name mobile_no download_date
0 Hector ABC 123 2021-05-30
1 John DYC 237 2020-09-28
2 Jose JKD 567 NaN
3 Luis AH NaN NaN
4 Billy DH NaN NaN
5 Harry AC 569 2020-01-15
merge
与 pd.concat
m = df1.mobile_no.isna()
merged_df = pd.concat([df1.loc[m], df1.loc[~m].merge(df2, on='mobile_no', how ='left')]).sort_index()
输出
name mobile_no download_date
0 Hector ABC 123 2021-05-30
1 Hector ABC 287 2020-09-28
2 Jose JKD 567 NaN
3 Luis AH <NA> NaN
3 Harry AC 569 2020-01-15
4 Billy DH <NA> NaN
你要找的是Series.map
:
df["download_date"] = df["mobile_no"].map(df2.set_index("mobile_no")["download_date"])
print (df)
name mobile_no download_date
0 Hector ABC 123.0 2021-05-30
1 Hector ABC 287.0 2020-09-28
2 Jose JKD 567.0 NaN
3 Luis AH NaN NaN
4 Billy DH NaN NaN
5 Harry AC 569.0 2020-01-15