如何在sklearn的交叉验证中获得multi-class roc_auc?
How to get multi-class roc_auc in cross validate in sklearn?
我有一个分类问题,我想在 sklearn 中使用 cross_validate
获得 roc_auc
值。我的代码如下。
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")
from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = ('accuracy', 'roc_auc'))
但是,我收到以下错误。
ValueError: multiclass format is not supported
请注意,我选择roc_auc
是因为它同时支持binary
和multiclass
分类,如https://scikit-learn.org/stable/modules/model_evaluation.html
我也有二进制分类数据集。请让我知道如何解决此错误。
如果需要,我很乐意提供更多详细信息。
默认 multi_class='raise'
所以你需要 明确地 来改变它。
来自docs:
multi_class {‘raise’, ‘ovr’, ‘ovo’}, default=’raise’
Multiclass only. Determines the type of configuration to use. The
default value raises an error, so either 'ovr' or 'ovo' must be passed
explicitly.
'ovr'
:
Computes the AUC of each class against the rest [3] [4]. This treats
the multiclass case in the same way as the multilabel case. Sensitive
to class imbalance even when average == 'macro'
, because class
imbalance affects the composition of each of the ‘rest’ groupings.
'ovo'
:
Computes the average AUC of all possible pairwise combinations of
classes [5]. Insensitive to class imbalance when average == 'macro'
.
解决方案:
使用make_scorer
(docs):
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score
myscore = make_scorer(roc_auc_score, multi_class='ovo',needs_proba=True)
from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = myscore)
我有一个分类问题,我想在 sklearn 中使用 cross_validate
获得 roc_auc
值。我的代码如下。
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")
from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = ('accuracy', 'roc_auc'))
但是,我收到以下错误。
ValueError: multiclass format is not supported
请注意,我选择roc_auc
是因为它同时支持binary
和multiclass
分类,如https://scikit-learn.org/stable/modules/model_evaluation.html
我也有二进制分类数据集。请让我知道如何解决此错误。
如果需要,我很乐意提供更多详细信息。
默认 multi_class='raise'
所以你需要 明确地 来改变它。
来自docs:
multi_class {‘raise’, ‘ovr’, ‘ovo’}, default=’raise’
Multiclass only. Determines the type of configuration to use. The default value raises an error, so either 'ovr' or 'ovo' must be passed explicitly.
'ovr'
:Computes the AUC of each class against the rest [3] [4]. This treats the multiclass case in the same way as the multilabel case. Sensitive to class imbalance even when
average == 'macro'
, because class imbalance affects the composition of each of the ‘rest’ groupings.
'ovo'
:Computes the average AUC of all possible pairwise combinations of classes [5]. Insensitive to class imbalance when
average == 'macro'
.
解决方案:
使用make_scorer
(docs):
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(random_state = 0, class_weight="balanced")
from sklearn.metrics import make_scorer
from sklearn.metrics import roc_auc_score
myscore = make_scorer(roc_auc_score, multi_class='ovo',needs_proba=True)
from sklearn.model_selection import cross_validate
cross_validate(clf, X, y, cv=10, scoring = myscore)