任何 sklearn 模块 return 是否可以在 k 折交叉验证中针对负值 class 平均精度和召回分数?
Can any sklearn module return average precision and recall scores for negative class in k-fold cross validation?
我正在尝试在 10 折交叉验证中获得正负 class 的准确率和召回率的平均值。我的模型是二进制 classifier.
我 运行 下面的代码,不幸的是它只 return 平均精确率和召回率 class。我如何告诉算法 return 负 class 的平均精度和召回分数?
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import cross_validate
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score),
'recall' : make_scorer(recall_score),
'f1_score' : make_scorer(f1_score)}
results = cross_validate(model_unbalanced_data_10_times_weight, X, Y, cv=10, scoring=scoring)
np.mean(results['test_precision'])
np.mean(results['test_recall'])
我还尝试使用命令“classification_report(y_test, predictions)
”打印 classification 报告,结果如下面的屏幕截图所示。但是,我相信 classification 报告中的 precision/recall 分数仅基于 1 运行 而不是超过 10 倍的平均值(如果我错了请纠正我)。
根据我们上面的讨论,我确实认为计算每个 cv 折叠的预测并计算 cross_validation_report
应该是正确的方法。结果现在应该考虑 cv 折叠的数量:
>>> from sklearn.metrics import classification_report
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.model_selection import cross_val_predict
>>>
>>> iris = load_iris()
>>>
>>> rf_clf = RandomForestClassifier()
>>>
>>> preds = cross_val_predict(estimator=rf_clf,
... X=iris["data"],
... y=iris["target"],
... cv=15)
>>>
>>> print(classification_report(iris["target"], preds))
precision recall f1-score support
0 1.00 1.00 1.00 50
1 0.92 0.94 0.93 50
2 0.94 0.92 0.93 50
accuracy 0.95 150
macro avg 0.95 0.95 0.95 150
weighted avg 0.95 0.95 0.95 150
我正在尝试在 10 折交叉验证中获得正负 class 的准确率和召回率的平均值。我的模型是二进制 classifier.
我 运行 下面的代码,不幸的是它只 return 平均精确率和召回率 class。我如何告诉算法 return 负 class 的平均精度和召回分数?
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import cross_validate
scoring = {'accuracy' : make_scorer(accuracy_score),
'precision' : make_scorer(precision_score),
'recall' : make_scorer(recall_score),
'f1_score' : make_scorer(f1_score)}
results = cross_validate(model_unbalanced_data_10_times_weight, X, Y, cv=10, scoring=scoring)
np.mean(results['test_precision'])
np.mean(results['test_recall'])
我还尝试使用命令“classification_report(y_test, predictions)
”打印 classification 报告,结果如下面的屏幕截图所示。但是,我相信 classification 报告中的 precision/recall 分数仅基于 1 运行 而不是超过 10 倍的平均值(如果我错了请纠正我)。
根据我们上面的讨论,我确实认为计算每个 cv 折叠的预测并计算 cross_validation_report
应该是正确的方法。结果现在应该考虑 cv 折叠的数量:
>>> from sklearn.metrics import classification_report
>>> from sklearn.datasets import load_iris
>>> from sklearn.ensemble import RandomForestClassifier
>>> from sklearn.model_selection import cross_val_predict
>>>
>>> iris = load_iris()
>>>
>>> rf_clf = RandomForestClassifier()
>>>
>>> preds = cross_val_predict(estimator=rf_clf,
... X=iris["data"],
... y=iris["target"],
... cv=15)
>>>
>>> print(classification_report(iris["target"], preds))
precision recall f1-score support
0 1.00 1.00 1.00 50
1 0.92 0.94 0.93 50
2 0.94 0.92 0.93 50
accuracy 0.95 150
macro avg 0.95 0.95 0.95 150
weighted avg 0.95 0.95 0.95 150