Pandas: Dataframe self-join with complex conditions
Pandas: Dataframe self-join with complex conditions
我有一个 DataFrame,其中包含来自 2 个玩家游戏的一组数据。每个游戏(具有唯一 ID)都有多个回合,每个 2 名玩家都选择了一些动作。它看起来像下面这样(为了清楚起见,我删除了一些行):
gameId round player action
0 Afom9bWqYBgZXXXN8 1 PvQ8B5kuA9Fbq9N59 1
1 Afom9bWqYBgZXXXN8 1 PJmJgrqusFZ8KRShQ 0
2 Afom9bWqYBgZXXXN8 2 PvQ8B5kuA9Fbq9N59 0
3 Afom9bWqYBgZXXXN8 2 PJmJgrqusFZ8KRShQ 0
4 Afom9bWqYBgZXXXN8 3 PJmJgrqusFZ8KRShQ 0
5 Afom9bWqYBgZXXXN8 3 PvQ8B5kuA9Fbq9N59 0
20 QdZM4yPMnjGj8f25R 1 Q6knaWEruc6BDPQT7 1
21 QdZM4yPMnjGj8f25R 1 xnAjMcWaFRpfBbukz 1
22 QdZM4yPMnjGj8f25R 2 xnAjMcWaFRpfBbukz 1
23 QdZM4yPMnjGj8f25R 2 Q6knaWEruc6BDPQT7 0
24 QdZM4yPMnjGj8f25R 3 Q6knaWEruc6BDPQT7 1
25 QdZM4yPMnjGj8f25R 3 xnAjMcWaFRpfBbukz 1
40 riMD6ctT8DLwdhHpE 1 EKkrMpMqy2PRLm7ur 1
41 riMD6ctT8DLwdhHpE 1 EqbbmngPfZBEmPTzq 1
42 riMD6ctT8DLwdhHpE 2 EKkrMpMqy2PRLm7ur 1
43 riMD6ctT8DLwdhHpE 2 EqbbmngPfZBEmPTzq 1
44 riMD6ctT8DLwdhHpE 3 EqbbmngPfZBEmPTzq 1
45 riMD6ctT8DLwdhHpE 3 EKkrMpMqy2PRLm7ur 1
60 hyEjkAg5K4WpubJA9 1 7CHpY4setLKb9ssnN 1
61 hyEjkAg5K4WpubJA9 1 hbud2J3YvitEhj4xZ 0
62 hyEjkAg5K4WpubJA9 2 hbud2J3YvitEhj4xZ 0
63 hyEjkAg5K4WpubJA9 2 7CHpY4setLKb9ssnN 0
64 hyEjkAg5K4WpubJA9 3 7CHpY4setLKb9ssnN 0
65 hyEjkAg5K4WpubJA9 3 hbud2J3YvitEhj4xZ 1
80 ay5pmpeNcwqHJ8JBH 1 tWA9ZxSnKpZyWwYsQ 1
81 ay5pmpeNcwqHJ8JBH 1 2qiHdJgL4WQe5qrHQ 1
82 ay5pmpeNcwqHJ8JBH 2 2qiHdJgL4WQe5qrHQ 1
83 ay5pmpeNcwqHJ8JBH 2 tWA9ZxSnKpZyWwYsQ 1
84 ay5pmpeNcwqHJ8JBH 3 tWA9ZxSnKpZyWwYsQ 1
85 ay5pmpeNcwqHJ8JBH 3 2qiHdJgL4WQe5qrHQ 1
我想向 DataFrame 添加一个 新列,对于给定回合中每个玩家的动作,his/her opponent's 在同一局游戏的 前一轮 中的行动,如果有的话。什么是快速、简洁的方法而不是使用非常长(和慢)的循环?
请注意,在每个 (gameId, round)
密钥中,只有两个具有不同 ID 的玩家。 Dataframe.merge seems like a close match (example),但它需要如下内容:
df.merge(df_copy, left_on=['gameId', 'round', 'player'], \
right_on=['gameId', df_copy.round - 1, df.player != df_copy.player])
但不支持自连接条件下的df.player != df_copy.player
我认为您应该首先用通用别名替换播放器代码,例如1和2。你可以这样做:
s = df.groupby(['gameId', 'player']).size().reset_index(0, drop=True)
s[:] = np.arange(len(s)) % 2 + 1
df['player_alias'] = s.reindex(df.player).values
然后您可以为每一行构建上一轮和对手球员的索引,并将其映射到相应的动作:
prev_round = df['round'] - 1
opp_player = 3 - df.player_alias # effectively maps 2 to 1 and 1 to 2
ix = pd.MultiIndex.from_arrays([df.gameId, prev_round, opp_player])
df['opp_prev_action'] = df.set_index(['gameId', 'round', 'player_alias']
).reindex(ix).action.values
请注意,对于第 1 轮,prev_round
为 0,这导致所需列中的 nans
。
我有一个 DataFrame,其中包含来自 2 个玩家游戏的一组数据。每个游戏(具有唯一 ID)都有多个回合,每个 2 名玩家都选择了一些动作。它看起来像下面这样(为了清楚起见,我删除了一些行):
gameId round player action
0 Afom9bWqYBgZXXXN8 1 PvQ8B5kuA9Fbq9N59 1
1 Afom9bWqYBgZXXXN8 1 PJmJgrqusFZ8KRShQ 0
2 Afom9bWqYBgZXXXN8 2 PvQ8B5kuA9Fbq9N59 0
3 Afom9bWqYBgZXXXN8 2 PJmJgrqusFZ8KRShQ 0
4 Afom9bWqYBgZXXXN8 3 PJmJgrqusFZ8KRShQ 0
5 Afom9bWqYBgZXXXN8 3 PvQ8B5kuA9Fbq9N59 0
20 QdZM4yPMnjGj8f25R 1 Q6knaWEruc6BDPQT7 1
21 QdZM4yPMnjGj8f25R 1 xnAjMcWaFRpfBbukz 1
22 QdZM4yPMnjGj8f25R 2 xnAjMcWaFRpfBbukz 1
23 QdZM4yPMnjGj8f25R 2 Q6knaWEruc6BDPQT7 0
24 QdZM4yPMnjGj8f25R 3 Q6knaWEruc6BDPQT7 1
25 QdZM4yPMnjGj8f25R 3 xnAjMcWaFRpfBbukz 1
40 riMD6ctT8DLwdhHpE 1 EKkrMpMqy2PRLm7ur 1
41 riMD6ctT8DLwdhHpE 1 EqbbmngPfZBEmPTzq 1
42 riMD6ctT8DLwdhHpE 2 EKkrMpMqy2PRLm7ur 1
43 riMD6ctT8DLwdhHpE 2 EqbbmngPfZBEmPTzq 1
44 riMD6ctT8DLwdhHpE 3 EqbbmngPfZBEmPTzq 1
45 riMD6ctT8DLwdhHpE 3 EKkrMpMqy2PRLm7ur 1
60 hyEjkAg5K4WpubJA9 1 7CHpY4setLKb9ssnN 1
61 hyEjkAg5K4WpubJA9 1 hbud2J3YvitEhj4xZ 0
62 hyEjkAg5K4WpubJA9 2 hbud2J3YvitEhj4xZ 0
63 hyEjkAg5K4WpubJA9 2 7CHpY4setLKb9ssnN 0
64 hyEjkAg5K4WpubJA9 3 7CHpY4setLKb9ssnN 0
65 hyEjkAg5K4WpubJA9 3 hbud2J3YvitEhj4xZ 1
80 ay5pmpeNcwqHJ8JBH 1 tWA9ZxSnKpZyWwYsQ 1
81 ay5pmpeNcwqHJ8JBH 1 2qiHdJgL4WQe5qrHQ 1
82 ay5pmpeNcwqHJ8JBH 2 2qiHdJgL4WQe5qrHQ 1
83 ay5pmpeNcwqHJ8JBH 2 tWA9ZxSnKpZyWwYsQ 1
84 ay5pmpeNcwqHJ8JBH 3 tWA9ZxSnKpZyWwYsQ 1
85 ay5pmpeNcwqHJ8JBH 3 2qiHdJgL4WQe5qrHQ 1
我想向 DataFrame 添加一个 新列,对于给定回合中每个玩家的动作,his/her opponent's 在同一局游戏的 前一轮 中的行动,如果有的话。什么是快速、简洁的方法而不是使用非常长(和慢)的循环?
请注意,在每个 (gameId, round)
密钥中,只有两个具有不同 ID 的玩家。 Dataframe.merge seems like a close match (example),但它需要如下内容:
df.merge(df_copy, left_on=['gameId', 'round', 'player'], \
right_on=['gameId', df_copy.round - 1, df.player != df_copy.player])
但不支持自连接条件下的df.player != df_copy.player
我认为您应该首先用通用别名替换播放器代码,例如1和2。你可以这样做:
s = df.groupby(['gameId', 'player']).size().reset_index(0, drop=True)
s[:] = np.arange(len(s)) % 2 + 1
df['player_alias'] = s.reindex(df.player).values
然后您可以为每一行构建上一轮和对手球员的索引,并将其映射到相应的动作:
prev_round = df['round'] - 1
opp_player = 3 - df.player_alias # effectively maps 2 to 1 and 1 to 2
ix = pd.MultiIndex.from_arrays([df.gameId, prev_round, opp_player])
df['opp_prev_action'] = df.set_index(['gameId', 'round', 'player_alias']
).reindex(ix).action.values
请注意,对于第 1 轮,prev_round
为 0,这导致所需列中的 nans
。