为什么我的 cross_val_score() 准确率很高，而我的测试准确率却很低？

Question

在使用 KerasWrapper 时，我得到了非常高的训练准确率：95% 以上

X_train, X_test, y_train, y_test = train_test_split(train_data, train_labels, shuffle=True, test_size=0.3, random_state=42)

estimator = KerasClassifier(build_fn=build_model(130, 130, 20000), epochs=2, batch_size=128, verbose=1)
folds = KFold(n_splits=3, shuffle=True, random_state=128)
results = cross_val_score(estimator=estimator, X=X_train, y=y_train, cv=folds)

但是，我的预测准确率一点也不好。是不是经典的过拟合案例？

prediction = cross_val_predict(estimator=estimator, X=X_test, y=y_test, cv=folds)

metrics.accuracy_score(y_test_converted, prediction)
# accuracy is 0.03%

如何提高测试准确率？谢谢

Answer 1

Is it a classic case of overfitting?

不是——只是你的流程不对。

cross_val_predict 并不像您在此处所做的那样应用于 test 数据。准确率低可能是因为您尝试在测试数据集的每一折中重新训练您的模型，这比您的训练数据集小得多。

正确的过程是——用训练数据拟合你的估计器，在测试集上得到预测，然后计算测试准确率，即：

estimator.fit(X_train, y_train)
y_pred = estimator.predict(X_test)
metrics.accuracy_score(y_test, y_pred)

为什么我的 cross_val_score() 准确率很高，而我的测试准确率却很低？

Why is my cross_val_score() accuracy very high, but my test accuracy very low?

python

machine-learning

scikit-learn

cross-validation

keras