如何使用 cross_validation_scores 使用的分类器

Question

我正在尝试训练交叉验证的 SVM 模型（用于学校项目）。给定 X 和 y，当我调用

clf = svm.SVC(gamma='scale')
scores = cross_val_score(clf, X, y, cv=4)

scores 按预期设置为数组，但我希望能够调用 clf.predict(test_x) 但是当我这样做时会抛出异常并显示消息 This SVC instance is not fitted yet. Call 'fit' with appropriate arguments before using this method. （我希望它会 return 类似于 [scores, predictor] 或者 CrossValidationPredictor 具有 predict 方法，但事实并非如此。）

当然，我可以调用 classifier = clf.fit(X, y)，但这并没有给我一个交叉验证的 SVM 预测器，我如何获得一个交叉验证的预测器，我可以用来——你知道——预测？

Answer 1

Of course, I can call classifier = clf.fit(X, y) but that doesn't give me a cross validated SVM predictor, how do I get a cross validated predictor that I can use to—you know—predict?

clf.fit(X, y) 正是你应该做的。

没有交叉验证预测器这样的东西，因为交叉验证不是训练预测器的方法，而是验证一种预测器。让我引用 Wikipedia entry:

Cross-validation [...] is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.

（此处的统计分析包括回归器或分类器等预测模型。）

交叉验证回答的问题是“当我将分类器应用于我还没有的数据时，它的性能如何？”。通常，您会尝试交叉验证不同的分类器或超参数，然后 select 得分最高的分类器或超参数，这是预期最能概括未见数据的分类器。

最后你在完整的数据集上训练分类器，因为你想部署最好的分类器。

Answer 2

也许你可以看看网格搜索：

Grid-search

scikit-learn provides an object that, given data, computes the score during the fit of an estimator on a parameter grid and chooses the parameters to maximize the cross-validation score. This object takes an estimator during the construction and exposes an estimator API

示例：

>>> from sklearn.model_selection import GridSearchCV, cross_val_score
>>> Cs = np.logspace(-6, -1, 10)
>>> clf = GridSearchCV(estimator=svc, param_grid=dict(C=Cs),
...                    n_jobs=-1)
>>> clf.fit(X_digits[:1000], y_digits[:1000])        
GridSearchCV(cv=None,...
>>> clf.best_score_                                  
0.925...
>>> clf.best_estimator_.C                            
0.0077...

>>> # Prediction performance on test set is not as good as on train set
>>> clf.score(X_digits[1000:], y_digits[1000:])

这是检查它的站点：https://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html

如何使用 cross_validation_scores 使用的分类器

How to use classifiers used by cross_validation_scores

svm

scikit-learn

cross-validation