如何通过交叉验证检测过度拟合：差异阈值应该是多少？

Question

建立分类模型后，我通过准确率、精确率和召回率对其进行了评估。为了检查过度拟合，我使用了 K 折交叉验证。我知道如果我的模型分数与我的交叉验证分数相差很大，那么我的模型就会过度拟合。但是，我坚持如何定义阈值。比如分数的差异有多大实际上会推断出模型过度拟合。例如，这里有 3 个拆分（3 折 CV，shuffle= True，random_state= 42）及其在逻辑回归模型上的各自分数：

Split Number  1
Accuracy= 0.9454545454545454
Precision= 0.94375
Recall= 1.0

Split Number  2
Accuracy= 0.9757575757575757
Precision= 0.9753086419753086
Recall= 1.0

Split Number  3
Accuracy= 0.9695121951219512
Precision= 0.9691358024691358
Recall= 1.0

不使用 CV 直接训练 Logistic 回归模型：

Accuracy= 0.9530201342281879
Precision= 0.952054794520548
Recall= 1.0

那么我如何决定我的分数需要变化多大才能推断出过度拟合的情况？

Answer 1

我假设您使用的是 Cross-validation:

这将拆分您的训练和测试数据。

现在你可能已经实现了这样的东西：

from sklearn.model_selection import cross_validate
from sklearn.metrics import recall_score
scoring = ['precision_macro', 'recall_macro']
clf = svm.SVC(kernel='linear', C=1, random_state=0)
scores = cross_validate(clf, iris.data, iris.target, scoring=scoring,cv=5)

所以现在您只计算测试分数，这在所有 3 种情况下都非常好。

第一个选项是：

return_train_score is set to False by default to save computation time. To evaluate the scores on the training set as well you need to be set to True

在那里你还可以看到你的折叠训练分数。如果您看到训练集的准确率为 1.0，则表明过度拟合。

另一个选项是：运行更多分裂。然后你确定算法没有过度拟合，如果每个测试分数都有很高的准确性，你就做得很好。

您是否添加了基线？我假设它是二进制 classifcation，我感觉数据集是高度不平衡的，所以 0.96 的准确度一般来说可能不太好，因为你的虚拟 classification（总是一个 class) 会有 0.95 的准确度。

如何通过交叉验证检测过度拟合：差异阈值应该是多少？

How to detect overfitting with Cross Validation: What should be the difference threshold?

python

classification

machine-learning

cross-validation