基于大小不同的 3 列合并数据集
Merge Datasets based on 3 columns with not the same size
我想合并两个数据集,但我不知道如何。第一个数据集具有以下格式。
Year Team Wins Loses
----------------------------
2020 MLK 14 4
2020 BRKL 10 5
2020 PHX 5 10
2019 BRKL 11 4
2019 MLK 10 5
2019 PHX 8 7
2018 ... ... ...
第二个数据集的格式如下:
Year Team1 Points1 Team2 Points2
---------------------------------------
2020 MLK 80 PHX 66
2020 PHX 71 BRKL 70
2020 BRKL 90 MLK 80
2019 PHX 69 BRKL 70
2019 ... ... ... ...
我希望的最终数据集具有以下格式:
Year Team1 Points1 Team2 Points2 Team1Wins Team1Loses Team2Wins Team2Loses
2020 MLK 80 PHX 66 14 4 5 10
2020 PHX 71 BRKL 70 5 10 10 5
2020 BRKL 90 MLK 80 10 5 14 4
2019 PHX 69 BRKL 70 8 7 11 4
2019 ... ... ... ... ... ... ... ...
我已阅读此处的问题,但没有找到解决问题的方法。我试过类似上面的方法,但这不是正确的解决方案。
import pandas as pd
a = pd.read_csv("Score.csv",error_bad_lines=False)
b = pd.read_csv("Team.csv",error_bad_lines=False)
merged = pd.merge(a, b, how='left', on=['Year'])
print(merged)
有办法实现吗?
将 DataFrame.merge
与 rename
列一起使用:
d = {'WIN%':'Team1Wins','FG%':'Team1Loses','Team':'Team1'}
df = b.merge(a.rename(columns = d)[['Team1Wins','Team1Loses','Team1','Year']],
on=['Year', 'Team1'])
d = {'WIN%':'Team2Wins','FG%':'Team2Loses','Team':'Team2'}
df = df.merge(a.rename(columns = d)[['Team2Wins','Team2Loses','Team2','Year']],
on=['Year', 'Team2'])
print (df)
Year Team1 Points1 Team2 Points2 Team1Wins Team1Loses Team2Wins \
0 2020 MLK 80 PHX 66 14 4 5
1 2020 PHX 71 BRKL 70 5 10 10
2 2020 BRKL 90 MLK 80 10 5 14
3 2019 PHX 69 BRKL 70 8 7 11
Team2Loses
0 10
1 5
2 4
3 4
我想合并两个数据集,但我不知道如何。第一个数据集具有以下格式。
Year Team Wins Loses
----------------------------
2020 MLK 14 4
2020 BRKL 10 5
2020 PHX 5 10
2019 BRKL 11 4
2019 MLK 10 5
2019 PHX 8 7
2018 ... ... ...
第二个数据集的格式如下:
Year Team1 Points1 Team2 Points2
---------------------------------------
2020 MLK 80 PHX 66
2020 PHX 71 BRKL 70
2020 BRKL 90 MLK 80
2019 PHX 69 BRKL 70
2019 ... ... ... ...
我希望的最终数据集具有以下格式:
Year Team1 Points1 Team2 Points2 Team1Wins Team1Loses Team2Wins Team2Loses
2020 MLK 80 PHX 66 14 4 5 10
2020 PHX 71 BRKL 70 5 10 10 5
2020 BRKL 90 MLK 80 10 5 14 4
2019 PHX 69 BRKL 70 8 7 11 4
2019 ... ... ... ... ... ... ... ...
我已阅读此处的问题,但没有找到解决问题的方法。我试过类似上面的方法,但这不是正确的解决方案。
import pandas as pd
a = pd.read_csv("Score.csv",error_bad_lines=False)
b = pd.read_csv("Team.csv",error_bad_lines=False)
merged = pd.merge(a, b, how='left', on=['Year'])
print(merged)
有办法实现吗?
将 DataFrame.merge
与 rename
列一起使用:
d = {'WIN%':'Team1Wins','FG%':'Team1Loses','Team':'Team1'}
df = b.merge(a.rename(columns = d)[['Team1Wins','Team1Loses','Team1','Year']],
on=['Year', 'Team1'])
d = {'WIN%':'Team2Wins','FG%':'Team2Loses','Team':'Team2'}
df = df.merge(a.rename(columns = d)[['Team2Wins','Team2Loses','Team2','Year']],
on=['Year', 'Team2'])
print (df)
Year Team1 Points1 Team2 Points2 Team1Wins Team1Loses Team2Wins \
0 2020 MLK 80 PHX 66 14 4 5
1 2020 PHX 71 BRKL 70 5 10 10
2 2020 BRKL 90 MLK 80 10 5 14
3 2019 PHX 69 BRKL 70 8 7 11
Team2Loses
0 10
1 5
2 4
3 4