pandas - 通过列表添加两个或多个不同DataFrames的值
pandas - Add values of two or more different DataFrames through a list
我希望通过列表在三个或更多 DataFrame 之间添加值,而不是一个一个地添加它们。
首先,我将以合并为例。
下面一行将DataFrames (data0
, data1
, data2
)一个一个合并:
final_data = data0.merge(data1, on=['player_id', 'player_name'])
final_data = final_data.merge(data2, on=['player_id', 'player_name'])
然而,相反,我可以通过列表合并 DataFrame,这在处理更多 DF 时非常有帮助,例如:
data_list = [data0, data1, data2]
final_data = reduce(lambda left, right: pd.merge(left, right, on=['player_id', 'player_name']), data_list)
所以现在,我有以下三个 DataFrame,我想在它们之间添加值。
data0
:
player_id player_name ab run hit
0 28920 S. Smith 0 0 0
1 33351 T. Mancini 0 0 0
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 0 0 0
4 31097 M. Machado 0 0 0
5 29170 C. Davis 0 0 0
6 29322 M. Trumbo 0 0 0
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 0 0 0
9 32952 J. Rickard 0 0 0
10 31988 J. Schoop 0 0 0
11 5908 J.J. Hardy 0 0 0
接下来,
data1
:
player_id player_name ab run hit
0 28920 S. Smith 1 4 6
1 33351 T. Mancini 0 0 2
2 28513 A. Jones 2 1 0
3 31097 M. Machado 1 8 0
4 34885 H. Kim 1 1 2
5 32952 J. Rickard 0 2 0
6 31988 J. Schoop 5 3 4
7 5908 J.J. Hardy 4 2 10
接下来,
data2
:
player_id player_name ab run hit
0 28920 S. Smith 1 9 2
1 31097 M. Machado 3 3 3
2 29170 C. Davis 9 6 4
3 29322 M. Trumbo 3 5 7
4 32952 J. Rickard 1 3 4
5 5908 J.J. Hardy 0 0 5
我希望获得的最终 DataFrame 应该如下所示:
final_data
:
player_id player_name ab run hit
0 28920 S. Smith 2 13 8
1 33351 T. Mancini 0 0 2
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 2 1 0
4 31097 M. Machado 4 11 3
5 29170 C. Davis 9 6 4
6 29322 M. Trumbo 3 5 7
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 1 1 2
9 32952 J. Rickard 1 5 4
10 31988 J. Schoop 5 3 4
11 5908 J.J. Hardy 4 2 15
我可以通过下面的代码得到结果,但是一个一个地添加数据帧。
data0 = pd.read_csv('initial_df.csv')
data1 = pd.read_csv('add_vals1.csv')
data2 = pd.read_csv('add_vals2.csv')
data0 = data0.set_index(['player_id', 'player_name'])
data1 = data1.set_index(['player_id', 'player_name'])
data2 = data2.set_index(['player_id', 'player_name'])
final_data = data0.add(data1, fill_value=0).astype(int).reset_index()
final_data = final_data.set_index(['player_id', 'player_name'])
final_data = final_data.add(data2, fill_value=0).astype(int).reset_index()
谁能像我在顶部合并功能时一样,通过列表帮助获得最终结果?非常感谢!
我相信 read_csv
中的 MultiIndex
需要使用参数 index_col
,然后 reduce
和 add
:
from functools import reduce
data0 = pd.read_csv('initial_df.csv', index_col=['player_id', 'player_name'])
data1 = pd.read_csv('add_vals1.csv', index_col=['player_id', 'player_name'])
data2 = pd.read_csv('add_vals2.csv', index_col=['player_id', 'player_name'])
data_list = [data0, data1, data2]
final_data = reduce(lambda x, y: x.add(y, fill_value=0), data_list).reset_index()
print (final_data)
player_id player_name ab run hit
0 5908 J.J. Hardy 4.0 2.0 15.0
1 28513 A. Jones 2.0 1.0 0.0
2 28920 S. Smith 2.0 13.0 8.0
3 29170 C. Davis 9.0 6.0 4.0
4 29322 M. Trumbo 3.0 5.0 7.0
5 29564 W. Castillo 0.0 0.0 0.0
6 30267 C. Gentry 0.0 0.0 0.0
7 31097 M. Machado 4.0 11.0 3.0
8 31988 J. Schoop 5.0 3.0 4.0
9 32952 J. Rickard 1.0 5.0 4.0
10 33351 T. Mancini 0.0 0.0 2.0
11 34885 H. Kim 1.0 1.0 2.0
data_list = [data0, data1, data2]
final_data = pd.concat(data_list).sum(level=[0,1]).reset_index()
print (final_data)
player_id player_name ab run hit
0 28920 S. Smith 2 13 8
1 33351 T. Mancini 0 0 2
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 2 1 0
4 31097 M. Machado 4 11 3
5 29170 C. Davis 9 6 4
6 29322 M. Trumbo 3 5 7
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 1 1 2
9 32952 J. Rickard 1 5 4
10 31988 J. Schoop 5 3 4
11 5908 J.J. Hardy 4 2 15
我希望通过列表在三个或更多 DataFrame 之间添加值,而不是一个一个地添加它们。
首先,我将以合并为例。
下面一行将DataFrames (data0
, data1
, data2
)一个一个合并:
final_data = data0.merge(data1, on=['player_id', 'player_name'])
final_data = final_data.merge(data2, on=['player_id', 'player_name'])
然而,相反,我可以通过列表合并 DataFrame,这在处理更多 DF 时非常有帮助,例如:
data_list = [data0, data1, data2]
final_data = reduce(lambda left, right: pd.merge(left, right, on=['player_id', 'player_name']), data_list)
所以现在,我有以下三个 DataFrame,我想在它们之间添加值。
data0
:
player_id player_name ab run hit
0 28920 S. Smith 0 0 0
1 33351 T. Mancini 0 0 0
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 0 0 0
4 31097 M. Machado 0 0 0
5 29170 C. Davis 0 0 0
6 29322 M. Trumbo 0 0 0
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 0 0 0
9 32952 J. Rickard 0 0 0
10 31988 J. Schoop 0 0 0
11 5908 J.J. Hardy 0 0 0
接下来,
data1
:
player_id player_name ab run hit
0 28920 S. Smith 1 4 6
1 33351 T. Mancini 0 0 2
2 28513 A. Jones 2 1 0
3 31097 M. Machado 1 8 0
4 34885 H. Kim 1 1 2
5 32952 J. Rickard 0 2 0
6 31988 J. Schoop 5 3 4
7 5908 J.J. Hardy 4 2 10
接下来,
data2
:
player_id player_name ab run hit
0 28920 S. Smith 1 9 2
1 31097 M. Machado 3 3 3
2 29170 C. Davis 9 6 4
3 29322 M. Trumbo 3 5 7
4 32952 J. Rickard 1 3 4
5 5908 J.J. Hardy 0 0 5
我希望获得的最终 DataFrame 应该如下所示:
final_data
:
player_id player_name ab run hit
0 28920 S. Smith 2 13 8
1 33351 T. Mancini 0 0 2
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 2 1 0
4 31097 M. Machado 4 11 3
5 29170 C. Davis 9 6 4
6 29322 M. Trumbo 3 5 7
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 1 1 2
9 32952 J. Rickard 1 5 4
10 31988 J. Schoop 5 3 4
11 5908 J.J. Hardy 4 2 15
我可以通过下面的代码得到结果,但是一个一个地添加数据帧。
data0 = pd.read_csv('initial_df.csv')
data1 = pd.read_csv('add_vals1.csv')
data2 = pd.read_csv('add_vals2.csv')
data0 = data0.set_index(['player_id', 'player_name'])
data1 = data1.set_index(['player_id', 'player_name'])
data2 = data2.set_index(['player_id', 'player_name'])
final_data = data0.add(data1, fill_value=0).astype(int).reset_index()
final_data = final_data.set_index(['player_id', 'player_name'])
final_data = final_data.add(data2, fill_value=0).astype(int).reset_index()
谁能像我在顶部合并功能时一样,通过列表帮助获得最终结果?非常感谢!
我相信 read_csv
中的 MultiIndex
需要使用参数 index_col
,然后 reduce
和 add
:
from functools import reduce
data0 = pd.read_csv('initial_df.csv', index_col=['player_id', 'player_name'])
data1 = pd.read_csv('add_vals1.csv', index_col=['player_id', 'player_name'])
data2 = pd.read_csv('add_vals2.csv', index_col=['player_id', 'player_name'])
data_list = [data0, data1, data2]
final_data = reduce(lambda x, y: x.add(y, fill_value=0), data_list).reset_index()
print (final_data)
player_id player_name ab run hit
0 5908 J.J. Hardy 4.0 2.0 15.0
1 28513 A. Jones 2.0 1.0 0.0
2 28920 S. Smith 2.0 13.0 8.0
3 29170 C. Davis 9.0 6.0 4.0
4 29322 M. Trumbo 3.0 5.0 7.0
5 29564 W. Castillo 0.0 0.0 0.0
6 30267 C. Gentry 0.0 0.0 0.0
7 31097 M. Machado 4.0 11.0 3.0
8 31988 J. Schoop 5.0 3.0 4.0
9 32952 J. Rickard 1.0 5.0 4.0
10 33351 T. Mancini 0.0 0.0 2.0
11 34885 H. Kim 1.0 1.0 2.0
data_list = [data0, data1, data2]
final_data = pd.concat(data_list).sum(level=[0,1]).reset_index()
print (final_data)
player_id player_name ab run hit
0 28920 S. Smith 2 13 8
1 33351 T. Mancini 0 0 2
2 30267 C. Gentry 0 0 0
3 28513 A. Jones 2 1 0
4 31097 M. Machado 4 11 3
5 29170 C. Davis 9 6 4
6 29322 M. Trumbo 3 5 7
7 29564 W. Castillo 0 0 0
8 34885 H. Kim 1 1 2
9 32952 J. Rickard 1 5 4
10 31988 J. Schoop 5 3 4
11 5908 J.J. Hardy 4 2 15