数据帧更新代码在测试数据帧上完美运行,但在更大的数据帧上运行不佳

Dataframe update code runs perfectly on a test dataframe but not on a larger dataframe

我正在尝试更新数据框,虽然更新代码在测试数据框中工作得很好,但它不适用于更大的数据框。我似乎不明白为什么。

selection_weights:
   country                        league   Win   DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1   Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2   Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3   Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4   Africa    African Championship Women  2.07  1.50    3.3   1.45
5   Africa  African Nations Championship  2.07  1.50    3.3   1.45
6   Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7   Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8   Africa          CAF Champions League  2.07  1.50    3.3   1.45
9   Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45
10  Africa                 CAF Super Cup  2.07  1.50    3.3   1.45

selection_db:
   country                        league  Win  DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
1   Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
2   Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
3   Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
4   Africa    African Championship Women  1.1  0.7    3.2    2.2
5   Africa  African Nations Championship  1.1  0.7    3.2    2.2
6   Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
7   Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
8   Africa          CAF Champions League  1.1  0.7    3.2    2.2
9   Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
10  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
11  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
12  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
13  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
14  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2
15  Africa                Nile Basin Cup  1.1  0.7    3.2    2.2
16  Africa           WAFU Cup of Nations  1.1  0.7    3.2    2.2

ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))

print(selection_db)
   country                        league   Win   DNB  O 1.5  U 4.5
0   Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1   Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2   Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3   Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4   Africa    African Championship Women  2.07  1.50    3.3   1.45
5   Africa  African Nations Championship  2.07  1.50    3.3   1.45
6   Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7   Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8   Africa          CAF Champions League  2.07  1.50    3.3   1.45
9   Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45
10  Africa                 CAF Super Cup  2.07  1.50    3.3   1.45
11  Africa           CECAFA Championship  1.10  0.70    3.2   2.20
12  Africa              CECAFA Clubs Cup  1.10  0.70    3.2   2.20
13  Africa       COSAFA Championship U20  1.10  0.70    3.2   2.20
14  Africa                    COSAFA Cup  1.10  0.70    3.2   2.20
15  Africa                Nile Basin Cup  1.10  0.70    3.2   2.20
16  Africa           WAFU Cup of Nations  1.10  0.70    3.2   2.20

当我将 datframes 更改为更大的(甚至 df.head())时,如下所示:

selection_weights = selection_weights.head(10)
print(selection_weights)
  country                        league   Win   DNB  O 1.5  U 4.5
0  Africa         Africa Cup of Nations  3.68  1.86    5.2   1.45
1  Africa     Africa Cup of Nations U17  2.07  1.50    3.3   1.45
2  Africa     Africa Cup of Nations U20  2.07  1.50    3.3   1.45
3  Africa     Africa Cup of Nations U23  2.07  1.50    3.3   1.45
4  Africa    African Championship Women  2.07  1.50    3.3   1.45
5  Africa  African Nations Championship  2.07  1.50    3.3   1.45
6  Africa  CAF African Championship U17  2.07  1.50    3.3   1.45
7  Africa  CAF African Championship U20  2.07  1.50    3.3   1.45
8  Africa          CAF Champions League  2.07  1.50    3.3   1.45
9  Africa         CAF Confederation Cup  2.07  1.50    3.3   1.45

selection_db = selection_db.head(15)
print(selection_db)
       country                        league  Win  DNB  O 1.5  U 4.5
140149  Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
887344  Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
139868  Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
142111  Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
140735  Africa    African Championship Women  1.1  0.7    3.2    2.2
140013  Africa  African Nations Championship  1.1  0.7    3.2    2.2
140352  Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
142365  Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
139831  Africa          CAF Champions League  1.1  0.7    3.2    2.2
139738  Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
934878  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
140675  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
141533  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
143054  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
139846  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2

ids = ['country', 'league']
selection_db.update(selection_db[ids].merge(selection_weights, how='left'))
print(selection_db)
       country                        league  Win  DNB  O 1.5  U 4.5
140149  Africa         Africa Cup of Nations  1.1  0.7    3.2    2.2
887344  Africa     Africa Cup of Nations U17  1.1  0.7    3.2    2.2
139868  Africa     Africa Cup of Nations U20  1.1  0.7    3.2    2.2
142111  Africa     Africa Cup of Nations U23  1.1  0.7    3.2    2.2
140735  Africa    African Championship Women  1.1  0.7    3.2    2.2
140013  Africa  African Nations Championship  1.1  0.7    3.2    2.2
140352  Africa  CAF African Championship U17  1.1  0.7    3.2    2.2
142365  Africa  CAF African Championship U20  1.1  0.7    3.2    2.2
139831  Africa          CAF Champions League  1.1  0.7    3.2    2.2
139738  Africa         CAF Confederation Cup  1.1  0.7    3.2    2.2
934878  Africa                 CAF Super Cup  1.1  0.7    3.2    2.2
140675  Africa           CECAFA Championship  1.1  0.7    3.2    2.2
141533  Africa              CECAFA Clubs Cup  1.1  0.7    3.2    2.2
143054  Africa       COSAFA Championship U20  1.1  0.7    3.2    2.2
139846  Africa                    COSAFA Cup  1.1  0.7    3.2    2.2

为什么会这样?

问题的可能原因

DataFrame.update 内部依赖匹配索引(列和行)来更新相应的值。

现在,在您的小数据框中,合并 ids 似乎没有重复项,因此生成的合并数据框具有类似于 selection_db 的索引。但是在您的大型数据框中, selection_weights 中可能存在重复项,合并后会生成一个更大的数据框,该数据框不一定具有与您的 selection_db.

匹配的索引

解决方案(merge不需要)

selection_db = selection_db.set_index(ids)
selection_db.update(selection_weights.drop_duplicates(ids).set_index(ids))
selection_db = selection_db.reset_index()