Pandas: Dataframe self-join with complex conditions

Pandas: Dataframe self-join with complex conditions

我有一个 DataFrame,其中包含来自 2 个玩家游戏的一组数据。每个游戏(具有唯一 ID)都有多个回合,每个 2 名玩家都选择了一些动作。它看起来像下面这样(为了清楚起见,我删除了一些行):

    gameId  round   player  action
0   Afom9bWqYBgZXXXN8   1   PvQ8B5kuA9Fbq9N59   1
1   Afom9bWqYBgZXXXN8   1   PJmJgrqusFZ8KRShQ   0
2   Afom9bWqYBgZXXXN8   2   PvQ8B5kuA9Fbq9N59   0
3   Afom9bWqYBgZXXXN8   2   PJmJgrqusFZ8KRShQ   0
4   Afom9bWqYBgZXXXN8   3   PJmJgrqusFZ8KRShQ   0
5   Afom9bWqYBgZXXXN8   3   PvQ8B5kuA9Fbq9N59   0
20  QdZM4yPMnjGj8f25R   1   Q6knaWEruc6BDPQT7   1
21  QdZM4yPMnjGj8f25R   1   xnAjMcWaFRpfBbukz   1
22  QdZM4yPMnjGj8f25R   2   xnAjMcWaFRpfBbukz   1
23  QdZM4yPMnjGj8f25R   2   Q6knaWEruc6BDPQT7   0
24  QdZM4yPMnjGj8f25R   3   Q6knaWEruc6BDPQT7   1
25  QdZM4yPMnjGj8f25R   3   xnAjMcWaFRpfBbukz   1
40  riMD6ctT8DLwdhHpE   1   EKkrMpMqy2PRLm7ur   1
41  riMD6ctT8DLwdhHpE   1   EqbbmngPfZBEmPTzq   1
42  riMD6ctT8DLwdhHpE   2   EKkrMpMqy2PRLm7ur   1
43  riMD6ctT8DLwdhHpE   2   EqbbmngPfZBEmPTzq   1
44  riMD6ctT8DLwdhHpE   3   EqbbmngPfZBEmPTzq   1
45  riMD6ctT8DLwdhHpE   3   EKkrMpMqy2PRLm7ur   1
60  hyEjkAg5K4WpubJA9   1   7CHpY4setLKb9ssnN   1
61  hyEjkAg5K4WpubJA9   1   hbud2J3YvitEhj4xZ   0
62  hyEjkAg5K4WpubJA9   2   hbud2J3YvitEhj4xZ   0
63  hyEjkAg5K4WpubJA9   2   7CHpY4setLKb9ssnN   0
64  hyEjkAg5K4WpubJA9   3   7CHpY4setLKb9ssnN   0
65  hyEjkAg5K4WpubJA9   3   hbud2J3YvitEhj4xZ   1
80  ay5pmpeNcwqHJ8JBH   1   tWA9ZxSnKpZyWwYsQ   1
81  ay5pmpeNcwqHJ8JBH   1   2qiHdJgL4WQe5qrHQ   1
82  ay5pmpeNcwqHJ8JBH   2   2qiHdJgL4WQe5qrHQ   1
83  ay5pmpeNcwqHJ8JBH   2   tWA9ZxSnKpZyWwYsQ   1
84  ay5pmpeNcwqHJ8JBH   3   tWA9ZxSnKpZyWwYsQ   1
85  ay5pmpeNcwqHJ8JBH   3   2qiHdJgL4WQe5qrHQ   1

我想向 DataFrame 添加一个 新列,对于给定回合中每个玩家的动作,his/her opponent's 在同一局游戏的 前一轮 中的行动,如果有的话。什么是快速、简洁的方法而不是使用非常长(和慢)的循环?

请注意,在每个 (gameId, round) 密钥中,只有两个具有不同 ID 的玩家。 Dataframe.merge seems like a close match (example),但它需要如下内容:

df.merge(df_copy, left_on=['gameId', 'round', 'player'], \
         right_on=['gameId', df_copy.round - 1, df.player != df_copy.player])

但不支持自连接条件下的df.player != df_copy.player

我认为您应该首先用通用别名替换播放器代码,例如1和2。你可以这样做:

s = df.groupby(['gameId', 'player']).size().reset_index(0, drop=True)
s[:] = np.arange(len(s)) % 2 + 1
df['player_alias'] = s.reindex(df.player).values

然后您可以为每一行构建上一轮和对手球员的索引,并将其映射到相应的动作:

prev_round = df['round'] - 1 
opp_player = 3 - df.player_alias   # effectively maps 2 to 1 and 1 to 2

ix = pd.MultiIndex.from_arrays([df.gameId, prev_round, opp_player])
df['opp_prev_action'] = df.set_index(['gameId', 'round', 'player_alias']
                                     ).reindex(ix).action.values

请注意,对于第 1 轮,prev_round 为 0,这导致所需列中的 nans