当在循环中进行预测与从模型列表进行预测时,模型给出不同的分数

Models give different scores when predictions are made in a loop VS when predictions are made from a list of models

变量 grid.best_estimator_ 包含从 GridSearchCV

中找到的决策树模型
for subset in range(len(smol_X_train)):
    temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
    pred = temp_tree.predict(X_test)
    accuracy = accuracy_score(y_test, pred)
    print(accuracy)

输出-

0.827
0.7025 
0.782 
0.7205 
..
..
0.8365
0.8395 

附表-

tree_list = []

for subset in range(len(smol_X_train)):
    temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
    tree_list.append(temp_tree)

for one_tree in tree_list:
    pred = one_tree.predict(X_test)
    accuracy = accuracy_score(y_test, pred)
    print(accuracy)

输出-

0.8395
0.8395
0.8395
0.8395
..
..
0.8395
0.8395

列表中的模型返回相同的分数(最后一个模型的分数)。

  1. 为什么这里的输出不同?存储在列表中的模型不是都适合不同的子集,因此也应该给出不同的预测吗?
  2. 模型(除了最后一个)的适应度是否在放入列表时丢失?

克隆、拟合模型然后将其附加到列表中就可以了。而不是直接将模型附加到列表中。

from sklearn.base import clone

tree_list = []

for subset in range(len(smol_X_train)):
    temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
    tree_list.append(clone(temp_tree))
    pred = temp_tree.predict(X_test)
    accuracy = accuracy_score(y_test, pred)
    print(accuracy)