如何在 pandas 中组合并形成复杂的数据框

how to combine and form a complex data frame in pandas

我有一个名为 df 的数据框,格式如下:

       match_up     result
0   1985_1116_1234      1
1   1985_1120_1345      1
2   1985_1207_1250      1
3   1985_1229_1425      1

我有另一个名为 df1

的数据框
  team       win percentage     sum_of_last_six  seed_frequency
0  1116           0.700                5               7
1  1234           0.667                3              10
2  1120           0.636                4               9
3  1207           0.615                2              11
4  1229           0.345                2               3
5  1345           0.621                5              11
6  1425           0.572                1               2
7  1250           0.968                4              12

我需要以 df2 包含列 df2df3 的所有左侧值(在 1985_ 之后成功)的方式形成 2 个新数据框 matchup 在数据框 df 即。 1116, 1120, 1207, 1229df3 应具有 matchup 列右侧的值。

  team_df2        win_df2           sum_df2       seed_df2
0  1116           0.700                5               7
1  1120           0.636                4               9
2  1207           0.615                2              11
3  1229           0.345                2               3

   team_df3       win_df3           sum_df3       seed_df3
1  1234           0.667                3              10
5  1345           0.621                5              11
7  1250           0.968                4              12
6  1425           0.572                1               2

最后我需要一个新的数据框,它结合了三个数据框(dfdf2df3

我需要按照以下格式形成一个名为 combi 的新数据框:

      match_up      result  team_df2   win_df2  sum_df2  seed_df2  
  0 1985_1116_1234      1      1116      0.700      5        7
  1 1985_1120_1345      1      1120      0.636      4        9 
  2 1985_1207_1250      1      1207      0.615      2        11
  3 1985_1229_1425      1      1229      0.345      2        3

     team_df3       win_df3           sum_df3       seed_df3
      1234           0.667                3              10
      1345           0.621                5              11
      1250           0.968                4              12
      1425           0.572                1               2

如何在 pandas 中执行此操作?

您可以在 'match_up' 列上调用矢量化 str 方法来拆分字符串,将它们映射到 int 并创建一个列表,以便我们可以过滤第二个 df 以创建 df2 和 df3 :

In [90]:

left = list(map(int,(df['match_up'].str.split('_').str[1])))
right = list(map(int,(df['match_up'].str.split('_').str[2])))
print(left)
right
[1116, 1120, 1207, 1229]
Out[90]:
[1234, 1345, 1250, 1425]
In [91]:

df2 = df1[df1.win.isin(left)]
df2
Out[91]:
   team   win  percentage  sum_of_last_six  seed_frequency
0     0  1116       0.700                5               7
2     2  1120       0.636                4               9
3     3  1207       0.615                2              11
4     4  1229       0.345                2               3
In [92]:

df3 = df1[df1.win.isin(right)]
df3
Out[92]:
   team   win  percentage  sum_of_last_six  seed_frequency
1     1  1234       0.667                3              10
5     5  1345       0.621                5              11
6     6  1425       0.572                1               2
7     7  1250       0.968                4              12

如果需要,您可以重命名调用 rename 的列。

要使用重命名的列获得所需的合并输出 df:

In [95]:

df2 = df2.rename(columns={'team':'team_df2', 'win':'win_df2', 'sum_of_last_six':'sum_df2', 'seed_frequency':'seed_df2'})
df3 = df3.rename(columns={'team':'team_df3', 'win':'win_df3', 'sum_of_last_six':'sum_df3', 'seed_frequency':'seed_df3'})
In [101]:

pd.concat([df,df2,df3],axis=1)
Out[101]:
         match_up  result  team_df2  win_df2  percentage  sum_df2  seed_df2  \
0  1985_1116_1234       1         0     1116       0.700        5         7   
1  1985_1120_1345       1       NaN      NaN         NaN      NaN       NaN   
2  1985_1207_1250       1         2     1120       0.636        4         9   
3  1985_1229_1425       1         3     1207       0.615        2        11   
4             NaN     NaN         4     1229       0.345        2         3   
5             NaN     NaN       NaN      NaN         NaN      NaN       NaN   
6             NaN     NaN       NaN      NaN         NaN      NaN       NaN   
7             NaN     NaN       NaN      NaN         NaN      NaN       NaN   

   team_df3  win_df3  percentage  sum_df3  seed_df3  
0       NaN      NaN         NaN      NaN       NaN  
1         1     1234       0.667        3        10  
2       NaN      NaN         NaN      NaN       NaN  
3       NaN      NaN         NaN      NaN       NaN  
4       NaN      NaN         NaN      NaN       NaN  
5         5     1345       0.621        5        11  
6         6     1425       0.572        1         2  
7         7     1250       0.968        4        12