从第二个数据框中查找值
Looking up values from a second dataframe
我有 2 个要合并的数据框。第一个看起来如下:
Date HomeTeam AwayTeam
0 06/01/14 Real Madrid Celta Vigo
1 06/01/14 Celta Vigo Valencia
第二个看起来像这样:
EVENT_ID HomeTeam AwayTeam SELECTION ODDS
0 112324699 Real Madrid Celta Vigo Celta Vigo 47.50
1 112324699 Real Madrid Celta Vigo Real Madrid 1.13
2 112324699 Real Madrid Celta Vigo The Draw 16.00
3 112369682 Celta Vigo Valencia Celta Vigo 3.30
4 112369682 Celta Vigo Valencia The Draw 3.55
5 112369682 Celta Vigo Valencia Valencia 2.43
所以基本上在第二个数据帧中,一场比赛有 3 行,每支队伍各一行,平局 (SELECTION) 和相应的赔率 (ODDS) 各一行。
我现在要做的是将有关赔率的信息从第二个数据帧添加到第一个数据帧,所以我想以以下内容结束:
Date HomeTeam AwayTeam OddsHome OddsDraw OddsAway
0 06/01/14 Real Madrid Celta Vigo 1.13 16.00 47.50
1 06/01/14 Celta Vigo Valencia 3.30 3.55 2.43
我尝试编写并应用一个查找函数,但惨遭失败。
也许你能帮帮我?
我会将 df2 重塑为 new_df2 看起来像这样
df2['SELECTION'] = np.where(df2['SELECTION'] == df2['HomeTeam'], 'Home', np.where(df2['SELECTION'] == df2['AwayTeam'],'Away', 'Draw'))
new_df2 = df2.set_index(['EVENT_ID','HomeTeam','AwayTeam','SELECTION']).unstack().reset_index()
new_df2.columns = new_df2.columns.map(''.join)
EVENT_ID HomeTeam AwayTeam ODDSAway ODDSDraw ODDSHome
0 112324699 Real Madrid Celta Vigo 47.50 16.00 1.13
1 112369682 Celta Vigo Valencia 2.43 3.55 3.30
现在使用合并
df1.merge(new_df2, on = ['HomeTeam', 'AwayTeam']).drop('EVENT_ID', axis = 1)
你得到
Date HomeTeam AwayTeam ODDSAway ODDSDraw ODDSHome
0 06/01/14 Real Madrid Celta Vigo 47.50 16.00 1.13
1 06/01/14 Celta Vigo Valencia 2.43 3.55 3.30
不同的解决方案:
df2=df2.merge(df1,on=['HomeTeam','AwayTeam'],how='left')
df2['SELECTION']=df2.groupby('EVENT_ID').apply(lambda x : x.SELECTION.replace({x.HomeTeam.values[0]:'Home',x.AwayTeam.values[0]:'Away'})).values
df2.set_index(['HomeTeam','AwayTeam','Date','SELECTION']).ODDS.unstack().reset_index()
Out[1151]:
SELECTION HomeTeam AwayTeam Date Away Home TheDraw
0 CeltaVigo Valencia 06/01/14 2.43 3.30 3.55
1 RealMadrid CeltaVigo 06/01/14 47.50 1.13 16.00
我有 2 个要合并的数据框。第一个看起来如下:
Date HomeTeam AwayTeam
0 06/01/14 Real Madrid Celta Vigo
1 06/01/14 Celta Vigo Valencia
第二个看起来像这样:
EVENT_ID HomeTeam AwayTeam SELECTION ODDS
0 112324699 Real Madrid Celta Vigo Celta Vigo 47.50
1 112324699 Real Madrid Celta Vigo Real Madrid 1.13
2 112324699 Real Madrid Celta Vigo The Draw 16.00
3 112369682 Celta Vigo Valencia Celta Vigo 3.30
4 112369682 Celta Vigo Valencia The Draw 3.55
5 112369682 Celta Vigo Valencia Valencia 2.43
所以基本上在第二个数据帧中,一场比赛有 3 行,每支队伍各一行,平局 (SELECTION) 和相应的赔率 (ODDS) 各一行。
我现在要做的是将有关赔率的信息从第二个数据帧添加到第一个数据帧,所以我想以以下内容结束:
Date HomeTeam AwayTeam OddsHome OddsDraw OddsAway
0 06/01/14 Real Madrid Celta Vigo 1.13 16.00 47.50
1 06/01/14 Celta Vigo Valencia 3.30 3.55 2.43
我尝试编写并应用一个查找函数,但惨遭失败。
也许你能帮帮我?
我会将 df2 重塑为 new_df2 看起来像这样
df2['SELECTION'] = np.where(df2['SELECTION'] == df2['HomeTeam'], 'Home', np.where(df2['SELECTION'] == df2['AwayTeam'],'Away', 'Draw'))
new_df2 = df2.set_index(['EVENT_ID','HomeTeam','AwayTeam','SELECTION']).unstack().reset_index()
new_df2.columns = new_df2.columns.map(''.join)
EVENT_ID HomeTeam AwayTeam ODDSAway ODDSDraw ODDSHome
0 112324699 Real Madrid Celta Vigo 47.50 16.00 1.13
1 112369682 Celta Vigo Valencia 2.43 3.55 3.30
现在使用合并
df1.merge(new_df2, on = ['HomeTeam', 'AwayTeam']).drop('EVENT_ID', axis = 1)
你得到
Date HomeTeam AwayTeam ODDSAway ODDSDraw ODDSHome
0 06/01/14 Real Madrid Celta Vigo 47.50 16.00 1.13
1 06/01/14 Celta Vigo Valencia 2.43 3.55 3.30
不同的解决方案:
df2=df2.merge(df1,on=['HomeTeam','AwayTeam'],how='left')
df2['SELECTION']=df2.groupby('EVENT_ID').apply(lambda x : x.SELECTION.replace({x.HomeTeam.values[0]:'Home',x.AwayTeam.values[0]:'Away'})).values
df2.set_index(['HomeTeam','AwayTeam','Date','SELECTION']).ODDS.unstack().reset_index()
Out[1151]:
SELECTION HomeTeam AwayTeam Date Away Home TheDraw
0 CeltaVigo Valencia 06/01/14 2.43 3.30 3.55
1 RealMadrid CeltaVigo 06/01/14 47.50 1.13 16.00