将评分函数从 sklearn.metrics 传递到 GridSearchCV
Pass a scoring function from sklearn.metrics to GridSearchCV
GridSearchCV's documentations 声明我可以传递评分函数。
scoring : string, callable or None, default=None
我想使用原生 accuracy_score 作为评分函数。
所以这是我的尝试。导入和一些数据:
import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn import neighbors
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([0, 1, 0, 0, 0, 1])
现在,当我只使用 k 折交叉验证而不使用我的评分函数时,一切都按预期工作:
parameters = {
'n_neighbors': [2, 3, 4],
'weights':['uniform', 'distance'],
'p': [1, 2, 3]
}
model = neighbors.KNeighborsClassifier()
k_fold = KFold(len(Y), n_folds=6, shuffle=True, random_state=0)
clf = GridSearchCV(model, parameters, cv=k_fold) # TODO will change
clf.fit(X, Y)
print clf.best_score_
但是当我将行更改为
clf = GridSearchCV(model, parameters, cv=k_fold, scoring=accuracy_score) # or accuracy_score()
我收到错误:ValueError: Cannot have number of folds n_folds=10 greater than the number of samples: 6.
我认为这并不代表真正的问题。
我认为问题是 accuracy_score
不遵循文档中写的签名 scorer(estimator, X, y)
那么我该如何解决这个问题呢?
如果您将 scoring=accuracy_score
更改为 scoring='accuracy'
(see the documentation 以获取完整的得分手列表,您可以通过这种方式使用姓名。)
理论上,您应该能够像您尝试的那样通过自定义评分函数,但我猜您是对的,accuracy_score
没有正确的 API。
这是一个使用加权 Kappa 作为简单随机森林模型的 GridSearchCV 评分指标的示例。对我来说,关键的学习是在 'make_scorer' 函数中使用与得分手相关的参数。
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import cohen_kappa_score, make_scorer
kappa_scorer = make_scorer(cohen_kappa_score,weights="quadratic")
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_features': range(2,10), # try features from 2 to 10
'min_samples_leaf': [3, 4, 5],
'n_estimators' : [100,300,500],
'max_depth': [5]
}
# Create a based model
random_forest = RandomForestClassifier(class_weight ="balanced_subsample",random_state=1)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest, param_grid = param_grid,
cv = 5, n_jobs = -1, verbose = 2, scoring = kappa_scorer) # search for best model using roc_auc
# Fit the grid search to the data
grid_search.fit(final_tr, yTrain)
GridSearchCV's documentations 声明我可以传递评分函数。
scoring : string, callable or None, default=None
我想使用原生 accuracy_score 作为评分函数。
所以这是我的尝试。导入和一些数据:
import numpy as np
from sklearn.cross_validation import KFold, cross_val_score
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn import neighbors
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([0, 1, 0, 0, 0, 1])
现在,当我只使用 k 折交叉验证而不使用我的评分函数时,一切都按预期工作:
parameters = {
'n_neighbors': [2, 3, 4],
'weights':['uniform', 'distance'],
'p': [1, 2, 3]
}
model = neighbors.KNeighborsClassifier()
k_fold = KFold(len(Y), n_folds=6, shuffle=True, random_state=0)
clf = GridSearchCV(model, parameters, cv=k_fold) # TODO will change
clf.fit(X, Y)
print clf.best_score_
但是当我将行更改为
clf = GridSearchCV(model, parameters, cv=k_fold, scoring=accuracy_score) # or accuracy_score()
我收到错误:ValueError: Cannot have number of folds n_folds=10 greater than the number of samples: 6.
我认为这并不代表真正的问题。
我认为问题是 accuracy_score
不遵循文档中写的签名 scorer(estimator, X, y)
那么我该如何解决这个问题呢?
如果您将 scoring=accuracy_score
更改为 scoring='accuracy'
(see the documentation 以获取完整的得分手列表,您可以通过这种方式使用姓名。)
理论上,您应该能够像您尝试的那样通过自定义评分函数,但我猜您是对的,accuracy_score
没有正确的 API。
这是一个使用加权 Kappa 作为简单随机森林模型的 GridSearchCV 评分指标的示例。对我来说,关键的学习是在 'make_scorer' 函数中使用与得分手相关的参数。
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import cohen_kappa_score, make_scorer
kappa_scorer = make_scorer(cohen_kappa_score,weights="quadratic")
# Create the parameter grid based on the results of random search
param_grid = {
'bootstrap': [True],
'max_features': range(2,10), # try features from 2 to 10
'min_samples_leaf': [3, 4, 5],
'n_estimators' : [100,300,500],
'max_depth': [5]
}
# Create a based model
random_forest = RandomForestClassifier(class_weight ="balanced_subsample",random_state=1)
# Instantiate the grid search model
grid_search = GridSearchCV(estimator = random_forest, param_grid = param_grid,
cv = 5, n_jobs = -1, verbose = 2, scoring = kappa_scorer) # search for best model using roc_auc
# Fit the grid search to the data
grid_search.fit(final_tr, yTrain)