数据帧更新代码在测试数据帧上完美运行,但在更大的数据帧上运行不佳
Dataframe update code runs perfectly on a test dataframe but not on a larger dataframe
我正在尝试更新数据框,虽然更新代码在测试数据框中工作得很好,但它不适用于更大的数据框。我似乎不明白为什么。
selection_weights:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
selection_db:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
1 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
2 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
3 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
4 Africa African Championship Women 1.1 0.7 3.2 2.2
5 Africa African Nations Championship 1.1 0.7 3.2 2.2
6 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
7 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
8 Africa CAF Champions League 1.1 0.7 3.2 2.2
9 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
10 Africa CAF Super Cup 1.1 0.7 3.2 2.2
11 Africa CECAFA Championship 1.1 0.7 3.2 2.2
12 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
13 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
14 Africa COSAFA Cup 1.1 0.7 3.2 2.2
15 Africa Nile Basin Cup 1.1 0.7 3.2 2.2
16 Africa WAFU Cup of Nations 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
11 Africa CECAFA Championship 1.10 0.70 3.2 2.20
12 Africa CECAFA Clubs Cup 1.10 0.70 3.2 2.20
13 Africa COSAFA Championship U20 1.10 0.70 3.2 2.20
14 Africa COSAFA Cup 1.10 0.70 3.2 2.20
15 Africa Nile Basin Cup 1.10 0.70 3.2 2.20
16 Africa WAFU Cup of Nations 1.10 0.70 3.2 2.20
当我将 datframes 更改为更大的(甚至 df.head()
)时,如下所示:
selection_weights = selection_weights.head(10)
print(selection_weights)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
selection_db = selection_db.head(15)
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
为什么会这样?
问题的可能原因
DataFrame.update
内部依赖匹配索引(列和行)来更新相应的值。
现在,在您的小数据框中,合并 ids
似乎没有重复项,因此生成的合并数据框具有类似于 selection_db
的索引。但是在您的大型数据框中, selection_weights
中可能存在重复项,合并后会生成一个更大的数据框,该数据框不一定具有与您的 selection_db
.
匹配的索引
解决方案(merge
不需要)
selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()
我正在尝试更新数据框,虽然更新代码在测试数据框中工作得很好,但它不适用于更大的数据框。我似乎不明白为什么。
selection_weights:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
selection_db:
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
1 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
2 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
3 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
4 Africa African Championship Women 1.1 0.7 3.2 2.2
5 Africa African Nations Championship 1.1 0.7 3.2 2.2
6 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
7 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
8 Africa CAF Champions League 1.1 0.7 3.2 2.2
9 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
10 Africa CAF Super Cup 1.1 0.7 3.2 2.2
11 Africa CECAFA Championship 1.1 0.7 3.2 2.2
12 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
13 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
14 Africa COSAFA Cup 1.1 0.7 3.2 2.2
15 Africa Nile Basin Cup 1.1 0.7 3.2 2.2
16 Africa WAFU Cup of Nations 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
10 Africa CAF Super Cup 2.07 1.50 3.3 1.45
11 Africa CECAFA Championship 1.10 0.70 3.2 2.20
12 Africa CECAFA Clubs Cup 1.10 0.70 3.2 2.20
13 Africa COSAFA Championship U20 1.10 0.70 3.2 2.20
14 Africa COSAFA Cup 1.10 0.70 3.2 2.20
15 Africa Nile Basin Cup 1.10 0.70 3.2 2.20
16 Africa WAFU Cup of Nations 1.10 0.70 3.2 2.20
当我将 datframes 更改为更大的(甚至 df.head()
)时,如下所示:
selection_weights = selection_weights.head(10)
print(selection_weights)
country league Win DNB O 1.5 U 4.5
0 Africa Africa Cup of Nations 3.68 1.86 5.2 1.45
1 Africa Africa Cup of Nations U17 2.07 1.50 3.3 1.45
2 Africa Africa Cup of Nations U20 2.07 1.50 3.3 1.45
3 Africa Africa Cup of Nations U23 2.07 1.50 3.3 1.45
4 Africa African Championship Women 2.07 1.50 3.3 1.45
5 Africa African Nations Championship 2.07 1.50 3.3 1.45
6 Africa CAF African Championship U17 2.07 1.50 3.3 1.45
7 Africa CAF African Championship U20 2.07 1.50 3.3 1.45
8 Africa CAF Champions League 2.07 1.50 3.3 1.45
9 Africa CAF Confederation Cup 2.07 1.50 3.3 1.45
selection_db = selection_db.head(15)
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
country league Win DNB O 1.5 U 4.5
140149 Africa Africa Cup of Nations 1.1 0.7 3.2 2.2
887344 Africa Africa Cup of Nations U17 1.1 0.7 3.2 2.2
139868 Africa Africa Cup of Nations U20 1.1 0.7 3.2 2.2
142111 Africa Africa Cup of Nations U23 1.1 0.7 3.2 2.2
140735 Africa African Championship Women 1.1 0.7 3.2 2.2
140013 Africa African Nations Championship 1.1 0.7 3.2 2.2
140352 Africa CAF African Championship U17 1.1 0.7 3.2 2.2
142365 Africa CAF African Championship U20 1.1 0.7 3.2 2.2
139831 Africa CAF Champions League 1.1 0.7 3.2 2.2
139738 Africa CAF Confederation Cup 1.1 0.7 3.2 2.2
934878 Africa CAF Super Cup 1.1 0.7 3.2 2.2
140675 Africa CECAFA Championship 1.1 0.7 3.2 2.2
141533 Africa CECAFA Clubs Cup 1.1 0.7 3.2 2.2
143054 Africa COSAFA Championship U20 1.1 0.7 3.2 2.2
139846 Africa COSAFA Cup 1.1 0.7 3.2 2.2
为什么会这样?
问题的可能原因
DataFrame.update
内部依赖匹配索引(列和行)来更新相应的值。
现在,在您的小数据框中,合并 ids
似乎没有重复项,因此生成的合并数据框具有类似于 selection_db
的索引。但是在您的大型数据框中, selection_weights
中可能存在重复项,合并后会生成一个更大的数据框,该数据框不一定具有与您的 selection_db
.
解决方案(merge
不需要)
selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()