使用 scikit-learn 绘制接收器操作特性时出现问题?
Problems plotting Receiver Operating Characteristic with scikit-learn?
我想绘制接收器操作特性曲线,所以我执行以下操作:
from sklearn.metrics import roc_curve, auc
predictions = auto_wclf.predict_proba(X_test)
false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:, 1])
roc_auc = auc(false_positive_rate, recall)
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, recall, 'b', label='AUC = %0.2f' % roc_auc)
plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.ylabel('Recall')
plt.xlabel('Fall-out')
plt.show()
但是我得到这个例外:
Traceback (most recent call last):
File "plot.py", line 172, in <module>
false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:, 1])
File "plot.py", line 890, in roc_curve
y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 710, in _binary_clf_curve
raise ValueError("Data is not binary and pos_label is not specified")
ValueError: Data is not binary and pos_label is not specified
我有一个多标签分类问题(5 个类别)。知道如何绘制这个吗?提前谢谢大家。
是的,ROC曲线"is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied"(wiki)。
此外,"The extension of ROC curves for classification problems with more than two classes has always been cumbersome, as the degrees of freedom increase quadratically with the number of classes, and the ROC space has c(c-1) dimensions, where c is the number of classes."(same wiki page)因为你有5个类甚至是多标签,ROC曲线不适合你
改用 Hamming loss, F1-score, accuracy, precision, recall 之类的指标 - 选择最适合您任务的指标。
我想绘制接收器操作特性曲线,所以我执行以下操作:
from sklearn.metrics import roc_curve, auc
predictions = auto_wclf.predict_proba(X_test)
false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:, 1])
roc_auc = auc(false_positive_rate, recall)
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, recall, 'b', label='AUC = %0.2f' % roc_auc)
plt.legend(loc='lower right')
plt.plot([0, 1], [0, 1], 'r--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.0])
plt.ylabel('Recall')
plt.xlabel('Fall-out')
plt.show()
但是我得到这个例外:
Traceback (most recent call last):
File "plot.py", line 172, in <module>
false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:, 1])
File "plot.py", line 890, in roc_curve
y_true, y_score, pos_label=pos_label, sample_weight=sample_weight)
File "/usr/local/lib/python2.7/site-packages/sklearn/metrics/metrics.py", line 710, in _binary_clf_curve
raise ValueError("Data is not binary and pos_label is not specified")
ValueError: Data is not binary and pos_label is not specified
我有一个多标签分类问题(5 个类别)。知道如何绘制这个吗?提前谢谢大家。
是的,ROC曲线"is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied"(wiki)。
此外,"The extension of ROC curves for classification problems with more than two classes has always been cumbersome, as the degrees of freedom increase quadratically with the number of classes, and the ROC space has c(c-1) dimensions, where c is the number of classes."(same wiki page)因为你有5个类甚至是多标签,ROC曲线不适合你
改用 Hamming loss, F1-score, accuracy, precision, recall 之类的指标 - 选择最适合您任务的指标。