Sklearn 中 make_scorer 函数中的 "Too many indices for array" 错误
"Too many indices for array" error in make_scorer function in Sklearn
目标:使用 brier score loss 使用 GridSearchCV 训练随机森林算法
问题:使用 make_scorer 时,目标 "y" 的概率预测是错误的维度。
在查看 之后,我正在使用其建议的代理函数来使用经过 brier 分数损失训练的 GridSearchCV。下面是一个设置示例:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import brier_score_loss,make_scorer
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False, \
needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
X = np.random.randn(100,2)
y = (X[:,0]>0).astype(int)
random_forest = RandomForestClassifier(n_estimators=10)
random_forest.fit(X,y)
probs = random_forest.predict_proba(X)
现在将 probs
和 y
直接传递给 brier_score_loss
或 ProbaScoreProxy
将不会导致错误:
ProbaScoreProxy(y,probs,1,brier_score_loss)
输出:
0.0006
现在通过 brier_scorer
:
brier_scorer(random_forest,X,y)
输出:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-28-1474bb08e572> in <module>()
----> 1 brier_scorer(random_forest,X,y)
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, X, y_true, sample_weight)
167 stacklevel=2)
168 return self._score(partial(_cached_call, None), estimator, X, y_true,
--> 169 sample_weight=sample_weight)
170
171 def _factory_args(self):
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in _score(self, method_caller, clf, X, y, sample_weight)
258 **self._kwargs)
259 else:
--> 260 return self._sign * self._score_func(y, y_pred, **self._kwargs)
261
262 def _factory_args(self):
<ipython-input-25-5321477444e1> in ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs)
5
6 def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
----> 7 return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
8
9 brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False, needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
IndexError: too many indices for array
所以似乎 make_scorer
中发生了一些事情来改变它的概率输入的维度,但我似乎看不出问题是什么。
版本:
- sklearn: '0.22.2.post1'
- numpy: '1.18.1'
请注意,这里的 y
是正确的维度 (1-d),您会通过摆弄发现它是传递给 ProbaScoreProxy
的 y_probs
的维度导致问题。
这只是最后一个问题中写得不好的代码吗? 最终有什么方法可以让 make_score 对象接受 GridSearchCV
之类的东西来训练 RF?
Goal: use brier score loss to train a random forest algorithm using GridSearchCV
为了这个目标,可以直接使用字符串值'neg_brier_score'
in GridSearchCV
scoring
参数。
例如:
gc = GridSearchCV(random_forest,
param_grid={"n_estimators":[5, 10]},
scoring="neg_brier_score")
gc.fit(X, y)
print(gc.scorer_)
# make_scorer(brier_score_loss, greater_is_better=False, needs_proba=True)
目标:使用 brier score loss 使用 GridSearchCV 训练随机森林算法
问题:使用 make_scorer 时,目标 "y" 的概率预测是错误的维度。
在查看
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import brier_score_loss,make_scorer
from sklearn.ensemble import RandomForestClassifier
import numpy as np
def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False, \
needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
X = np.random.randn(100,2)
y = (X[:,0]>0).astype(int)
random_forest = RandomForestClassifier(n_estimators=10)
random_forest.fit(X,y)
probs = random_forest.predict_proba(X)
现在将 probs
和 y
直接传递给 brier_score_loss
或 ProbaScoreProxy
将不会导致错误:
ProbaScoreProxy(y,probs,1,brier_score_loss)
输出:
0.0006
现在通过 brier_scorer
:
brier_scorer(random_forest,X,y)
输出:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-28-1474bb08e572> in <module>()
----> 1 brier_scorer(random_forest,X,y)
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in __call__(self, estimator, X, y_true, sample_weight)
167 stacklevel=2)
168 return self._score(partial(_cached_call, None), estimator, X, y_true,
--> 169 sample_weight=sample_weight)
170
171 def _factory_args(self):
~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/_scorer.py in _score(self, method_caller, clf, X, y, sample_weight)
258 **self._kwargs)
259 else:
--> 260 return self._sign * self._score_func(y, y_pred, **self._kwargs)
261
262 def _factory_args(self):
<ipython-input-25-5321477444e1> in ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs)
5
6 def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
----> 7 return proxied_func(y_true, y_probs[:, class_idx], **kwargs)
8
9 brier_scorer = make_scorer(ProbaScoreProxy, greater_is_better=False, needs_proba=True, class_idx=1, proxied_func=brier_score_loss)
IndexError: too many indices for array
所以似乎 make_scorer
中发生了一些事情来改变它的概率输入的维度,但我似乎看不出问题是什么。
版本: - sklearn: '0.22.2.post1' - numpy: '1.18.1'
请注意,这里的 y
是正确的维度 (1-d),您会通过摆弄发现它是传递给 ProbaScoreProxy
的 y_probs
的维度导致问题。
这只是最后一个问题中写得不好的代码吗? 最终有什么方法可以让 make_score 对象接受 GridSearchCV
之类的东西来训练 RF?
Goal: use brier score loss to train a random forest algorithm using GridSearchCV
为了这个目标,可以直接使用字符串值'neg_brier_score'
in GridSearchCV
scoring
参数。
例如:
gc = GridSearchCV(random_forest,
param_grid={"n_estimators":[5, 10]},
scoring="neg_brier_score")
gc.fit(X, y)
print(gc.scorer_)
# make_scorer(brier_score_loss, greater_is_better=False, needs_proba=True)