来自其他数据集的映射数据。 Python Pandas
Mapping data from other dataset. Python Pandas
所以我得到了 2 个数据集,df1
有所有水果的颜色,而 df2
没有。如何根据水果名称根据 d1
的颜色数据映射 df2
的颜色值?
df1 df2
Name Color Name Color
Apple Red Orange Na
Orange Orange Coconut Na
Pear Pear Pear Na
Pear Pear Strawberries Na
Papaya Papaya Banana Na
Watermelon Watermelon Papaya Na
" " " "
我想你可以使用 map
, but first need Series.drop_duplicates
:
df2['Color'] = df2['Name'].map(df1.set_index('Name')['Color'].drop_duplicates())
print (df2)
Name Color
0 Orange Orange
1 Coconut NaN
2 Pear Pear
3 Strawberries NaN
4 Banana NaN
5 Papaya Papaya
merge
with DataFrame.drop_duplicates
and DataFrame.drop
的另一个解决方案:
df2 = pd.merge(df2.drop('Color', axis=1),df1.drop_duplicates(), how='left')
print (df2)
Name Color
0 Orange Orange
1 Coconut NaN
2 Pear Pear
3 Strawberries NaN
4 Banana NaN
5 Papaya Papaya
您可以使用 merge:
df2 = df2.merge(df1, on="Name", how="left", suffixes=('_1','_2'))
如果名称是您的索引列,您只需执行 join:
df2 = df2.join(df1[['color']])
如需更完整的示例,您可以查看回答 above/below,该回答足以详细说明我的回答。
所以我得到了 2 个数据集,df1
有所有水果的颜色,而 df2
没有。如何根据水果名称根据 d1
的颜色数据映射 df2
的颜色值?
df1 df2
Name Color Name Color
Apple Red Orange Na
Orange Orange Coconut Na
Pear Pear Pear Na
Pear Pear Strawberries Na
Papaya Papaya Banana Na
Watermelon Watermelon Papaya Na
" " " "
我想你可以使用 map
, but first need Series.drop_duplicates
:
df2['Color'] = df2['Name'].map(df1.set_index('Name')['Color'].drop_duplicates())
print (df2)
Name Color
0 Orange Orange
1 Coconut NaN
2 Pear Pear
3 Strawberries NaN
4 Banana NaN
5 Papaya Papaya
merge
with DataFrame.drop_duplicates
and DataFrame.drop
的另一个解决方案:
df2 = pd.merge(df2.drop('Color', axis=1),df1.drop_duplicates(), how='left')
print (df2)
Name Color
0 Orange Orange
1 Coconut NaN
2 Pear Pear
3 Strawberries NaN
4 Banana NaN
5 Papaya Papaya
您可以使用 merge:
df2 = df2.merge(df1, on="Name", how="left", suffixes=('_1','_2'))
如果名称是您的索引列,您只需执行 join:
df2 = df2.join(df1[['color']])
如需更完整的示例,您可以查看回答 above/below,该回答足以详细说明我的回答。