scikit-learn 的 BaggingClassifier 和自定义基础估计器的问题:操作数不能一起广播?
Issue with scikit-learn's BaggingClassifier and custom base estimator: operands can't be broadcast together?
我正在尝试将自定义分类器与 SciKit-Learn 的 BaggingClassifier
结合使用,但我遇到了一个无法确定来源的错误。我的分类器对象通过 check_estimator()
,我对 fit()
函数没有问题:
model = ensemble.BaggingClassifier(customEstimator, max_samples=1/n_estimators, n_estimators=n_estimators)
model.fit(trainfeat, trainlabels)
model.predict(testfeat)
这会产生以下错误跟踪。基本估计器本身通过 sigmoid 阈值进行二元预测。我知道这些值一定要和测试数据对应,但是不明白这三个算子应该是什么?而且,这似乎是错误来自 BaggingClassifier
,但问题一定出在我身上,不是吗?
我试图避免粘贴整个估算器的代码,但它继承了 BaseEstimator
而我只 write/overload 函数:fit
、predict
、 predict_proba
。我在这方面错过了什么吗?
我试过重塑 features/labels 无济于事,甚至没有改变错误。我还尝试让我的估算器继承 ClassifierMixin
,但最终给我带来了一系列新问题。
File "Main_File.py", line 76, in <module>
model.predict(testfeat)
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 310, in predict
indices.extend(np.where(_predict_binary(e, X) > thresh)[0])
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 98, in _predict_binary
score = estimator.predict_proba(X)[:, 1]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 698, in predict_proba
for i in range(n_jobs))
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 1003, in __call__
if self.dispatch_one_batch(iterator):
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 834, in dispatch_one_batch
self._dispatch(tasks)
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 753, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 201, in apply_async
result = ImmediateResult(func)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 582, in __init__
self.results = batch()
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 129, in _parallel_predict_proba
proba += proba_estimator
ValueError: operands could not be broadcast together with shapes (100000,2) (100000,) (100000,2)
我猜问题出在你的 customEstimator
的 predict_proba
的输出上。
看起来您当前的实施 return 输出具有维度 (n_samples, 1)
,这是不兼容的。确保您的 predict_proba
输出维度是 (n_samples, 2)
用于二进制分类问题。
我正在尝试将自定义分类器与 SciKit-Learn 的 BaggingClassifier
结合使用,但我遇到了一个无法确定来源的错误。我的分类器对象通过 check_estimator()
,我对 fit()
函数没有问题:
model = ensemble.BaggingClassifier(customEstimator, max_samples=1/n_estimators, n_estimators=n_estimators)
model.fit(trainfeat, trainlabels)
model.predict(testfeat)
这会产生以下错误跟踪。基本估计器本身通过 sigmoid 阈值进行二元预测。我知道这些值一定要和测试数据对应,但是不明白这三个算子应该是什么?而且,这似乎是错误来自 BaggingClassifier
,但问题一定出在我身上,不是吗?
我试图避免粘贴整个估算器的代码,但它继承了 BaseEstimator
而我只 write/overload 函数:fit
、predict
、 predict_proba
。我在这方面错过了什么吗?
我试过重塑 features/labels 无济于事,甚至没有改变错误。我还尝试让我的估算器继承 ClassifierMixin
,但最终给我带来了一系列新问题。
File "Main_File.py", line 76, in <module>
model.predict(testfeat)
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 310, in predict
indices.extend(np.where(_predict_binary(e, X) > thresh)[0])
File "G:\Software\Anaconda\lib\site-packages\sklearn\multiclass.py", line 98, in _predict_binary
score = estimator.predict_proba(X)[:, 1]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 698, in predict_proba
for i in range(n_jobs))
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 1003, in __call__
if self.dispatch_one_batch(iterator):
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 834, in dispatch_one_batch
self._dispatch(tasks)
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 753, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 201, in apply_async
result = ImmediateResult(func)
File "G:\Software\Anaconda\lib\site-packages\joblib\_parallel_backends.py", line 582, in __init__
self.results = batch()
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in __call__
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\joblib\parallel.py", line 256, in <listcomp>
for func, args, kwargs in self.items]
File "G:\Software\Anaconda\lib\site-packages\sklearn\ensemble\bagging.py", line 129, in _parallel_predict_proba
proba += proba_estimator
ValueError: operands could not be broadcast together with shapes (100000,2) (100000,) (100000,2)
我猜问题出在你的 customEstimator
的 predict_proba
的输出上。
看起来您当前的实施 return 输出具有维度 (n_samples, 1)
,这是不兼容的。确保您的 predict_proba
输出维度是 (n_samples, 2)
用于二进制分类问题。