为什么 CalibratedClassifierCV 不如直接分类器?
Why does CalibratedClassifierCV underperform a direct classifer?
我注意到当 base_estimator
为 GradientBoostingClassifer
时,sklearn 的新 CalibratedClassifierCV
似乎不如直接 base_estimator
,(我没有测试过其他分类器)。有趣的是,如果make_classification
的参数是:
n_features = 10
n_informative = 3
n_classes = 2
那么 CalibratedClassifierCV
似乎略胜一筹(对数损失评估)。
然而,在以下分类数据集下,CalibratedClassifierCV
似乎通常表现不佳:
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
clf_score = log_loss(y_test, probas)
print 'calibrated score:', cv_score
print 'direct clf score:', clf_score
print
一个 运行 产生了:
也许我遗漏了一些有关 CalibratedClassifierCV
工作原理的信息,或者我没有正确使用它,但我的印象是,如果有的话,将分类器传递给 CalibratedClassifierCV
会导致改进相对于单独 base_estimator
的性能。
谁能解释这种观察到的表现不佳?
使用校准分类器的目的是提出一个比普通分类器表现得更平滑的概率预测。它不是为了提高你的基础估计器的性能。
所以不能保证概率或对数损失是相同的(相同的邻域,但不相同)。但是,如果您绘制样本+概率图,您可能会看到更好的分布。
主要保留的是高于和低于决策阈值 (0.5) 的#samples。
概率校准本身需要交叉验证,因此CalibratedClassifierCV
每次训练一个校准分类器(在本例中使用StratifiedKFold
),并取每个分类器预测概率的平均值当你调用 predict_proba() 时。这可能导致对效果的解释。
我的假设是,如果训练集相对于特征数量和 类 而言较小,则每个子分类器的减少训练集会影响性能并且集成不会弥补它(或使情况变得更糟)。此外,GradientBoostingClassifier 可能从一开始就提供非常好的概率估计,因为它的损失函数针对概率估计进行了优化。
如果那是正确的,那么集成分类器的方式与 CalibratedClassifierCV 相同但没有校准应该比单个分类器差。此外,当使用大量折叠进行校准时,效果应该会消失。
为了测试这一点,我扩展了您的脚本以增加折叠数并包含未经校准的集成分类器,并且我能够确认我的预测。一个 10 倍校准的分类器总是比单个分类器表现更好,而未校准的集成明显更差。在我的 运行 中,3 倍校准的分类器也没有真正比单一分类器表现差,所以这也可能是一个不稳定的效果。这些是同一数据集的详细结果:
这是我实验的代码:
import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (3-fold):', cv_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (10-fold:)', cv_score
#Train 3 classifiers and take average probability
skf2 = cross_validation.StratifiedKFold(y_test, 3)
probas_list = []
for sub_train, sub_test in skf2:
X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_sub_train, y_sub_train)
probas_list.append(clf.predict_proba(X_test))
probas = np.mean(probas_list, axis=0)
clf_ensemble_score = log_loss(y_test, probas)
print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
score = log_loss(y_test, probas)
print 'direct clf score:', score
print
等渗回归方法(及其在 sklearn 中的实现)存在一些问题,使其成为校准的次优选择。
具体来说:
1) 它拟合分段常数函数而不是平滑变化的校准函数曲线。
2) Cross-Validation 对每次折叠得到的 models/calibrations 结果进行平均。然而,这些结果中的每一个仍然只在各自的折叠上适合和校准。
通常,更好的选择是 SplineCalibratedClassifierCV
class 在 ML-insights package (Disclaimer: I am an author of that package). The github repo for the package is here.
具有以下优点:
1) 它拟合三次平滑样条而不是分段常数函数。
2) 它使用整个 (cross-validated) 答案集进行校准,并在完整数据集上重新拟合基础模型。因此,校准函数和基础模型都在完整数据集上得到了有效训练。
从第一个例子开始,这张图显示了训练集(红点)、独立测试集(绿色+符号)的分箱概率,以及通过 ML-insights 样条法计算的校准(蓝线)和 isotonic-sklearn 方法(灰色 dots/line)。
我修改了您的代码以比较这些方法(并增加了示例数量)。它表明样条方法通常表现更好(就像我上面链接的示例一样)。
这是代码和结果:
代码(您必须先pip install ml_insights
):
import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
import ml_insights as mli
X, y = make_classification(n_samples=10000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=3)
clf_cv_mli.fit(X_train, y_train)
probas_cv_mli = clf_cv_mli.predict_proba(X_test)
cv_score_mli = log_loss(y_test, probas_cv_mli)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
clf_score = log_loss(y_test, probas)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=10)
clf_cv_mli.fit(X_train, y_train)
probas_cv_mli = clf_cv_mli.predict_proba(X_test)
cv_score_mli_10 = log_loss(y_test, probas_cv_mli)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score_10 = log_loss(y_test, probas_cv)
print('\nuncalibrated score: {}'.format(clf_score))
print('\ncalibrated score isotonic-sklearn (3-fold): {}'.format(cv_score))
print('calibrated score mli (3-fold): {}'.format(cv_score_mli))
print('\ncalibrated score isotonic-sklearn (10-fold): {}'.format(cv_score_10))
print('calibrated score mli (10-fold): {}\n'.format(cv_score_mli_10))
结果
uncalibrated score: 1.4475396740876696
calibrated score isotonic-sklearn (3-fold): 1.465140552847886
calibrated score mli (3-fold): 1.3651638065446683
calibrated score isotonic-sklearn (10-fold): 1.4158622673607426
calibrated score mli (10-fold): 1.3620771116522705
uncalibrated score: 1.5097320476479625
calibrated score isotonic-sklearn (3-fold): 1.5189534673089442
calibrated score mli (3-fold): 1.4386253950100405
calibrated score isotonic-sklearn (10-fold): 1.4976505139437257
calibrated score mli (10-fold): 1.4408912879989917
uncalibrated score: 1.4654527691892194
calibrated score isotonic-sklearn (3-fold): 1.493355643575107
calibrated score mli (3-fold): 1.388789694535648
calibrated score isotonic-sklearn (10-fold): 1.419760490609242
calibrated score mli (10-fold): 1.3830851694161692
uncalibrated score: 1.5163851866969407
calibrated score isotonic-sklearn (3-fold): 1.5532628847926322
calibrated score mli (3-fold): 1.459797287154743
calibrated score isotonic-sklearn (10-fold): 1.4748100659449732
calibrated score mli (10-fold): 1.4620173012979816
uncalibrated score: 1.4760935523959617
calibrated score isotonic-sklearn (3-fold): 1.469434735152088
calibrated score mli (3-fold): 1.402024502986732
calibrated score isotonic-sklearn (10-fold): 1.4702032019673137
calibrated score mli (10-fold): 1.3983943648572212
我注意到当 base_estimator
为 GradientBoostingClassifer
时,sklearn 的新 CalibratedClassifierCV
似乎不如直接 base_estimator
,(我没有测试过其他分类器)。有趣的是,如果make_classification
的参数是:
n_features = 10
n_informative = 3
n_classes = 2
那么 CalibratedClassifierCV
似乎略胜一筹(对数损失评估)。
然而,在以下分类数据集下,CalibratedClassifierCV
似乎通常表现不佳:
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
clf_score = log_loss(y_test, probas)
print 'calibrated score:', cv_score
print 'direct clf score:', clf_score
print
一个 运行 产生了:
也许我遗漏了一些有关 CalibratedClassifierCV
工作原理的信息,或者我没有正确使用它,但我的印象是,如果有的话,将分类器传递给 CalibratedClassifierCV
会导致改进相对于单独 base_estimator
的性能。
谁能解释这种观察到的表现不佳?
使用校准分类器的目的是提出一个比普通分类器表现得更平滑的概率预测。它不是为了提高你的基础估计器的性能。
所以不能保证概率或对数损失是相同的(相同的邻域,但不相同)。但是,如果您绘制样本+概率图,您可能会看到更好的分布。
主要保留的是高于和低于决策阈值 (0.5) 的#samples。
概率校准本身需要交叉验证,因此CalibratedClassifierCV
每次训练一个校准分类器(在本例中使用StratifiedKFold
),并取每个分类器预测概率的平均值当你调用 predict_proba() 时。这可能导致对效果的解释。
我的假设是,如果训练集相对于特征数量和 类 而言较小,则每个子分类器的减少训练集会影响性能并且集成不会弥补它(或使情况变得更糟)。此外,GradientBoostingClassifier 可能从一开始就提供非常好的概率估计,因为它的损失函数针对概率估计进行了优化。
如果那是正确的,那么集成分类器的方式与 CalibratedClassifierCV 相同但没有校准应该比单个分类器差。此外,当使用大量折叠进行校准时,效果应该会消失。
为了测试这一点,我扩展了您的脚本以增加折叠数并包含未经校准的集成分类器,并且我能够确认我的预测。一个 10 倍校准的分类器总是比单个分类器表现更好,而未校准的集成明显更差。在我的 运行 中,3 倍校准的分类器也没有真正比单一分类器表现差,所以这也可能是一个不稳定的效果。这些是同一数据集的详细结果:
这是我实验的代码:
import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
X, y = make_classification(n_samples=1000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (3-fold):', cv_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
print 'calibrated score (10-fold:)', cv_score
#Train 3 classifiers and take average probability
skf2 = cross_validation.StratifiedKFold(y_test, 3)
probas_list = []
for sub_train, sub_test in skf2:
X_sub_train, X_sub_test = X_train[sub_train], X_train[sub_test]
y_sub_train, y_sub_test = y_train[sub_train], y_train[sub_test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_sub_train, y_sub_train)
probas_list.append(clf.predict_proba(X_test))
probas = np.mean(probas_list, axis=0)
clf_ensemble_score = log_loss(y_test, probas)
print 'uncalibrated ensemble clf (3-fold) score:', clf_ensemble_score
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
score = log_loss(y_test, probas)
print 'direct clf score:', score
print
等渗回归方法(及其在 sklearn 中的实现)存在一些问题,使其成为校准的次优选择。
具体来说:
1) 它拟合分段常数函数而不是平滑变化的校准函数曲线。
2) Cross-Validation 对每次折叠得到的 models/calibrations 结果进行平均。然而,这些结果中的每一个仍然只在各自的折叠上适合和校准。
通常,更好的选择是 SplineCalibratedClassifierCV
class 在 ML-insights package (Disclaimer: I am an author of that package). The github repo for the package is here.
具有以下优点:
1) 它拟合三次平滑样条而不是分段常数函数。
2) 它使用整个 (cross-validated) 答案集进行校准,并在完整数据集上重新拟合基础模型。因此,校准函数和基础模型都在完整数据集上得到了有效训练。
从第一个例子开始,这张图显示了训练集(红点)、独立测试集(绿色+符号)的分箱概率,以及通过 ML-insights 样条法计算的校准(蓝线)和 isotonic-sklearn 方法(灰色 dots/line)。
我修改了您的代码以比较这些方法(并增加了示例数量)。它表明样条方法通常表现更好(就像我上面链接的示例一样)。
这是代码和结果:
代码(您必须先pip install ml_insights
):
import numpy as np
from sklearn.datasets import make_classification
from sklearn import ensemble
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import log_loss
from sklearn import cross_validation
import ml_insights as mli
X, y = make_classification(n_samples=10000,
n_features=100,
n_informative=30,
n_redundant=0,
n_repeated=0,
n_classes=9,
random_state=0,
shuffle=False)
skf = cross_validation.StratifiedShuffleSplit(y, 5)
for train, test in skf:
X_train, X_test = X[train], X[test]
y_train, y_test = y[train], y[test]
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=3)
clf_cv_mli.fit(X_train, y_train)
probas_cv_mli = clf_cv_mli.predict_proba(X_test)
cv_score_mli = log_loss(y_test, probas_cv_mli)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=3, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score = log_loss(y_test, probas_cv)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf.fit(X_train, y_train)
probas = clf.predict_proba(X_test)
clf_score = log_loss(y_test, probas)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv_mli = mli.SplineCalibratedClassifierCV(clf, cv=10)
clf_cv_mli.fit(X_train, y_train)
probas_cv_mli = clf_cv_mli.predict_proba(X_test)
cv_score_mli_10 = log_loss(y_test, probas_cv_mli)
clf = ensemble.GradientBoostingClassifier(n_estimators=100)
clf_cv = CalibratedClassifierCV(clf, cv=10, method='isotonic')
clf_cv.fit(X_train, y_train)
probas_cv = clf_cv.predict_proba(X_test)
cv_score_10 = log_loss(y_test, probas_cv)
print('\nuncalibrated score: {}'.format(clf_score))
print('\ncalibrated score isotonic-sklearn (3-fold): {}'.format(cv_score))
print('calibrated score mli (3-fold): {}'.format(cv_score_mli))
print('\ncalibrated score isotonic-sklearn (10-fold): {}'.format(cv_score_10))
print('calibrated score mli (10-fold): {}\n'.format(cv_score_mli_10))
结果
uncalibrated score: 1.4475396740876696
calibrated score isotonic-sklearn (3-fold): 1.465140552847886
calibrated score mli (3-fold): 1.3651638065446683
calibrated score isotonic-sklearn (10-fold): 1.4158622673607426
calibrated score mli (10-fold): 1.3620771116522705
uncalibrated score: 1.5097320476479625
calibrated score isotonic-sklearn (3-fold): 1.5189534673089442
calibrated score mli (3-fold): 1.4386253950100405
calibrated score isotonic-sklearn (10-fold): 1.4976505139437257
calibrated score mli (10-fold): 1.4408912879989917
uncalibrated score: 1.4654527691892194
calibrated score isotonic-sklearn (3-fold): 1.493355643575107
calibrated score mli (3-fold): 1.388789694535648
calibrated score isotonic-sklearn (10-fold): 1.419760490609242
calibrated score mli (10-fold): 1.3830851694161692
uncalibrated score: 1.5163851866969407
calibrated score isotonic-sklearn (3-fold): 1.5532628847926322
calibrated score mli (3-fold): 1.459797287154743
calibrated score isotonic-sklearn (10-fold): 1.4748100659449732
calibrated score mli (10-fold): 1.4620173012979816
uncalibrated score: 1.4760935523959617
calibrated score isotonic-sklearn (3-fold): 1.469434735152088
calibrated score mli (3-fold): 1.402024502986732
calibrated score isotonic-sklearn (10-fold): 1.4702032019673137
calibrated score mli (10-fold): 1.3983943648572212