当在循环中进行预测与从模型列表进行预测时,模型给出不同的分数
Models give different scores when predictions are made in a loop VS when predictions are made from a list of models
变量 grid.best_estimator_
包含从 GridSearchCV
中找到的决策树模型
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
pred = temp_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)
输出-
0.827
0.7025
0.782
0.7205
..
..
0.8365
0.8395
附表-
tree_list = []
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
tree_list.append(temp_tree)
for one_tree in tree_list:
pred = one_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)
输出-
0.8395
0.8395
0.8395
0.8395
..
..
0.8395
0.8395
列表中的模型返回相同的分数(最后一个模型的分数)。
- 为什么这里的输出不同?存储在列表中的模型不是都适合不同的子集,因此也应该给出不同的预测吗?
- 模型(除了最后一个)的适应度是否在放入列表时丢失?
克隆、拟合模型然后将其附加到列表中就可以了。而不是直接将模型附加到列表中。
from sklearn.base import clone
tree_list = []
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
tree_list.append(clone(temp_tree))
pred = temp_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)
变量 grid.best_estimator_
包含从 GridSearchCV
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
pred = temp_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)
输出-
0.827
0.7025
0.782
0.7205
..
..
0.8365
0.8395
附表-
tree_list = []
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
tree_list.append(temp_tree)
for one_tree in tree_list:
pred = one_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)
输出-
0.8395
0.8395
0.8395
0.8395
..
..
0.8395
0.8395
列表中的模型返回相同的分数(最后一个模型的分数)。
- 为什么这里的输出不同?存储在列表中的模型不是都适合不同的子集,因此也应该给出不同的预测吗?
- 模型(除了最后一个)的适应度是否在放入列表时丢失?
克隆、拟合模型然后将其附加到列表中就可以了。而不是直接将模型附加到列表中。
from sklearn.base import clone
tree_list = []
for subset in range(len(smol_X_train)):
temp_tree = grid.best_estimator_.fit(smol_X_train[subset], smol_y_train[subset])
tree_list.append(clone(temp_tree))
pred = temp_tree.predict(X_test)
accuracy = accuracy_score(y_test, pred)
print(accuracy)