从另一个 DataFrame 替换 pandas.DataFrame 中的值的优雅方式
Elegant way to replace values in pandas.DataFrame from another DataFrame
我有一个数据框,我想用另一个数据框中的值替换一列中的值。
df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
'value1': ["a","b","c","d","e","f","g","h"],
'value3': ["yes","no","yes","no","no","no","yes","no"]})
dfReplace = pd.DataFrame({'id2': [1001,1002],
'value2': ["rep1","rep2"]})
我需要使用带有公共键的 groupby,当前的解决方案是使用循环。是否有更优雅(更快)的方法来使用 .map(apply) 等来执行此操作。我希望初始使用 pd.update(),但似乎不是正确的方法。
groups = dfReplace.groupby(['id2'])
for key, group in groups:
df.loc[df['id1']==key,'value1']=group['value2'].values
输出
df
id1 value1 value3
0 1001 rep1 yes
1 1002 rep2 no
2 1001 rep1 yes
3 1003 d no
4 1004 e no
5 1005 f no
6 1002 rep2 yes
7 1006 h no
尝试merge():
merge = df.merge(dfReplace, left_on='id1', right_on='id2', how='left')
print(merge)
merge.ix[(merge.id1 == merge.id2), 'value1'] = merge.value2
print(merge)
del merge['id2']
del merge['value2']
print(merge)
输出:
id1 value1 value3 id2 value2
0 1001 a yes 1001 rep1
1 1002 b no 1002 rep2
2 1001 c yes 1001 rep1
3 1003 d no NaN NaN
4 1004 e no NaN NaN
5 1005 f no NaN NaN
6 1002 g yes 1002 rep2
7 1006 h no NaN NaN
id1 value1 value3 id2 value2
0 1001 rep1 yes 1001 rep1
1 1002 rep2 no 1002 rep2
2 1001 rep1 yes 1001 rep1
3 1003 d no NaN NaN
4 1004 e no NaN NaN
5 1005 f no NaN NaN
6 1002 rep2 yes 1002 rep2
7 1006 h no NaN NaN
id1 value1 value3
0 1001 rep1 yes
1 1002 rep2 no
2 1001 rep1 yes
3 1003 d no
4 1004 e no
5 1005 f no
6 1002 rep2 yes
7 1006 h no
如果您已经将索引设置为 id,这会更简洁一些,但如果没有,您仍然可以在一行中完成:
>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
.combine_first(df.set_index('id1')))
value1 value3
1001 rep1 yes
1001 rep1 yes
1002 rep2 no
1002 rep2 yes
1003 d no
1004 e no
1005 f no
1006 h no
如果分成三行分别做重命名和重新索引,可以看到 combine_first()
本身其实很简单:
>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )
>>> dfReplace.combine_first(df)
我有一个数据框,我想用另一个数据框中的值替换一列中的值。
df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
'value1': ["a","b","c","d","e","f","g","h"],
'value3': ["yes","no","yes","no","no","no","yes","no"]})
dfReplace = pd.DataFrame({'id2': [1001,1002],
'value2': ["rep1","rep2"]})
我需要使用带有公共键的 groupby,当前的解决方案是使用循环。是否有更优雅(更快)的方法来使用 .map(apply) 等来执行此操作。我希望初始使用 pd.update(),但似乎不是正确的方法。
groups = dfReplace.groupby(['id2'])
for key, group in groups:
df.loc[df['id1']==key,'value1']=group['value2'].values
输出
df
id1 value1 value3
0 1001 rep1 yes
1 1002 rep2 no
2 1001 rep1 yes
3 1003 d no
4 1004 e no
5 1005 f no
6 1002 rep2 yes
7 1006 h no
尝试merge():
merge = df.merge(dfReplace, left_on='id1', right_on='id2', how='left')
print(merge)
merge.ix[(merge.id1 == merge.id2), 'value1'] = merge.value2
print(merge)
del merge['id2']
del merge['value2']
print(merge)
输出:
id1 value1 value3 id2 value2
0 1001 a yes 1001 rep1
1 1002 b no 1002 rep2
2 1001 c yes 1001 rep1
3 1003 d no NaN NaN
4 1004 e no NaN NaN
5 1005 f no NaN NaN
6 1002 g yes 1002 rep2
7 1006 h no NaN NaN
id1 value1 value3 id2 value2
0 1001 rep1 yes 1001 rep1
1 1002 rep2 no 1002 rep2
2 1001 rep1 yes 1001 rep1
3 1003 d no NaN NaN
4 1004 e no NaN NaN
5 1005 f no NaN NaN
6 1002 rep2 yes 1002 rep2
7 1006 h no NaN NaN
id1 value1 value3
0 1001 rep1 yes
1 1002 rep2 no
2 1001 rep1 yes
3 1003 d no
4 1004 e no
5 1005 f no
6 1002 rep2 yes
7 1006 h no
如果您已经将索引设置为 id,这会更简洁一些,但如果没有,您仍然可以在一行中完成:
>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
.combine_first(df.set_index('id1')))
value1 value3
1001 rep1 yes
1001 rep1 yes
1002 rep2 no
1002 rep2 yes
1003 d no
1004 e no
1005 f no
1006 h no
如果分成三行分别做重命名和重新索引,可以看到 combine_first()
本身其实很简单:
>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )
>>> dfReplace.combine_first(df)