如何在 pandas 中组合并形成复杂的数据框
how to combine and form a complex data frame in pandas
我有一个名为 df
的数据框,格式如下:
match_up result
0 1985_1116_1234 1
1 1985_1120_1345 1
2 1985_1207_1250 1
3 1985_1229_1425 1
我有另一个名为 df1
的数据框
team win percentage sum_of_last_six seed_frequency
0 1116 0.700 5 7
1 1234 0.667 3 10
2 1120 0.636 4 9
3 1207 0.615 2 11
4 1229 0.345 2 3
5 1345 0.621 5 11
6 1425 0.572 1 2
7 1250 0.968 4 12
我需要以 df2
包含列 df2
和 df3
的所有左侧值(在 1985_ 之后成功)的方式形成 2 个新数据框 matchup
在数据框 df
即。 1116, 1120, 1207, 1229
。 df3
应具有 matchup
列右侧的值。
team_df2 win_df2 sum_df2 seed_df2
0 1116 0.700 5 7
1 1120 0.636 4 9
2 1207 0.615 2 11
3 1229 0.345 2 3
team_df3 win_df3 sum_df3 seed_df3
1 1234 0.667 3 10
5 1345 0.621 5 11
7 1250 0.968 4 12
6 1425 0.572 1 2
最后我需要一个新的数据框,它结合了三个数据框(df
、df2
和df3
)
我需要按照以下格式形成一个名为 combi
的新数据框:
match_up result team_df2 win_df2 sum_df2 seed_df2
0 1985_1116_1234 1 1116 0.700 5 7
1 1985_1120_1345 1 1120 0.636 4 9
2 1985_1207_1250 1 1207 0.615 2 11
3 1985_1229_1425 1 1229 0.345 2 3
team_df3 win_df3 sum_df3 seed_df3
1234 0.667 3 10
1345 0.621 5 11
1250 0.968 4 12
1425 0.572 1 2
如何在 pandas 中执行此操作?
您可以在 'match_up' 列上调用矢量化 str
方法来拆分字符串,将它们映射到 int 并创建一个列表,以便我们可以过滤第二个 df 以创建 df2 和 df3 :
In [90]:
left = list(map(int,(df['match_up'].str.split('_').str[1])))
right = list(map(int,(df['match_up'].str.split('_').str[2])))
print(left)
right
[1116, 1120, 1207, 1229]
Out[90]:
[1234, 1345, 1250, 1425]
In [91]:
df2 = df1[df1.win.isin(left)]
df2
Out[91]:
team win percentage sum_of_last_six seed_frequency
0 0 1116 0.700 5 7
2 2 1120 0.636 4 9
3 3 1207 0.615 2 11
4 4 1229 0.345 2 3
In [92]:
df3 = df1[df1.win.isin(right)]
df3
Out[92]:
team win percentage sum_of_last_six seed_frequency
1 1 1234 0.667 3 10
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12
如果需要,您可以重命名调用 rename
的列。
要使用重命名的列获得所需的合并输出 df:
In [95]:
df2 = df2.rename(columns={'team':'team_df2', 'win':'win_df2', 'sum_of_last_six':'sum_df2', 'seed_frequency':'seed_df2'})
df3 = df3.rename(columns={'team':'team_df3', 'win':'win_df3', 'sum_of_last_six':'sum_df3', 'seed_frequency':'seed_df3'})
In [101]:
pd.concat([df,df2,df3],axis=1)
Out[101]:
match_up result team_df2 win_df2 percentage sum_df2 seed_df2 \
0 1985_1116_1234 1 0 1116 0.700 5 7
1 1985_1120_1345 1 NaN NaN NaN NaN NaN
2 1985_1207_1250 1 2 1120 0.636 4 9
3 1985_1229_1425 1 3 1207 0.615 2 11
4 NaN NaN 4 1229 0.345 2 3
5 NaN NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN
team_df3 win_df3 percentage sum_df3 seed_df3
0 NaN NaN NaN NaN NaN
1 1 1234 0.667 3 10
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12
我有一个名为 df
的数据框,格式如下:
match_up result
0 1985_1116_1234 1
1 1985_1120_1345 1
2 1985_1207_1250 1
3 1985_1229_1425 1
我有另一个名为 df1
team win percentage sum_of_last_six seed_frequency
0 1116 0.700 5 7
1 1234 0.667 3 10
2 1120 0.636 4 9
3 1207 0.615 2 11
4 1229 0.345 2 3
5 1345 0.621 5 11
6 1425 0.572 1 2
7 1250 0.968 4 12
我需要以 df2
包含列 df2
和 df3
的所有左侧值(在 1985_ 之后成功)的方式形成 2 个新数据框 matchup
在数据框 df
即。 1116, 1120, 1207, 1229
。 df3
应具有 matchup
列右侧的值。
team_df2 win_df2 sum_df2 seed_df2
0 1116 0.700 5 7
1 1120 0.636 4 9
2 1207 0.615 2 11
3 1229 0.345 2 3
team_df3 win_df3 sum_df3 seed_df3
1 1234 0.667 3 10
5 1345 0.621 5 11
7 1250 0.968 4 12
6 1425 0.572 1 2
最后我需要一个新的数据框,它结合了三个数据框(df
、df2
和df3
)
我需要按照以下格式形成一个名为 combi
的新数据框:
match_up result team_df2 win_df2 sum_df2 seed_df2
0 1985_1116_1234 1 1116 0.700 5 7
1 1985_1120_1345 1 1120 0.636 4 9
2 1985_1207_1250 1 1207 0.615 2 11
3 1985_1229_1425 1 1229 0.345 2 3
team_df3 win_df3 sum_df3 seed_df3
1234 0.667 3 10
1345 0.621 5 11
1250 0.968 4 12
1425 0.572 1 2
如何在 pandas 中执行此操作?
您可以在 'match_up' 列上调用矢量化 str
方法来拆分字符串,将它们映射到 int 并创建一个列表,以便我们可以过滤第二个 df 以创建 df2 和 df3 :
In [90]:
left = list(map(int,(df['match_up'].str.split('_').str[1])))
right = list(map(int,(df['match_up'].str.split('_').str[2])))
print(left)
right
[1116, 1120, 1207, 1229]
Out[90]:
[1234, 1345, 1250, 1425]
In [91]:
df2 = df1[df1.win.isin(left)]
df2
Out[91]:
team win percentage sum_of_last_six seed_frequency
0 0 1116 0.700 5 7
2 2 1120 0.636 4 9
3 3 1207 0.615 2 11
4 4 1229 0.345 2 3
In [92]:
df3 = df1[df1.win.isin(right)]
df3
Out[92]:
team win percentage sum_of_last_six seed_frequency
1 1 1234 0.667 3 10
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12
如果需要,您可以重命名调用 rename
的列。
要使用重命名的列获得所需的合并输出 df:
In [95]:
df2 = df2.rename(columns={'team':'team_df2', 'win':'win_df2', 'sum_of_last_six':'sum_df2', 'seed_frequency':'seed_df2'})
df3 = df3.rename(columns={'team':'team_df3', 'win':'win_df3', 'sum_of_last_six':'sum_df3', 'seed_frequency':'seed_df3'})
In [101]:
pd.concat([df,df2,df3],axis=1)
Out[101]:
match_up result team_df2 win_df2 percentage sum_df2 seed_df2 \
0 1985_1116_1234 1 0 1116 0.700 5 7
1 1985_1120_1345 1 NaN NaN NaN NaN NaN
2 1985_1207_1250 1 2 1120 0.636 4 9
3 1985_1229_1425 1 3 1207 0.615 2 11
4 NaN NaN 4 1229 0.345 2 3
5 NaN NaN NaN NaN NaN NaN NaN
6 NaN NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN NaN
team_df3 win_df3 percentage sum_df3 seed_df3
0 NaN NaN NaN NaN NaN
1 1 1234 0.667 3 10
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 5 1345 0.621 5 11
6 6 1425 0.572 1 2
7 7 1250 0.968 4 12