使用 GridSearchCV 时跳过禁止的参数组合
Skip forbidden parameter combinations when using GridSearchCV
我想使用 GridSearchCV. However, some combinations of parameters are forbidden by LinearSVC and throw an exception 贪婪地搜索支持向量分类器的整个参数 space。特别是 dual
、penalty
和 loss
参数的互斥组合:
例如,这段代码:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
Returns ValueError: Unsupported set of arguments: The combination of penalty='l2' and loss='hinge' are not supported when dual=False, Parameters: penalty='l2', loss='hinge', dual=False
我的问题是:是否可以让 GridSearchCV 跳过模型禁止的参数组合?如果没有,有没有一种简单的方法来构造一个不会违反规则的参数space?
我通过将 error_score=0.0
传递给 GridSearchCV
解决了这个问题:
error_score : ‘raise’ (default) or numeric
Value to assign to the
score if an error occurs in estimator fitting. If set to ‘raise’, the
error is raised. If a numeric value is given, FitFailedWarning is
raised. This parameter does not affect the refit step, which will
always raise the error.
更新:较新版本的 sklearn 打印出一堆 ConvergenceWarning
和 FitFailedWarning
。我很难用 contextlib.suppress
压制他们,但是 there is a hack around that 涉及测试上下文管理器:
from sklearn import svm, datasets
from sklearn.utils._testing import ignore_warnings
from sklearn.exceptions import FitFailedWarning, ConvergenceWarning
from sklearn.model_selection import GridSearchCV
with ignore_warnings(category=[ConvergenceWarning, FitFailedWarning]):
iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters, error_score=0.0)
clf.fit(iris.data, iris.target)
如果你想完全避免探索特定的组合(不用等到 运行 出错),你必须自己构建网格。 GridSearchCV 可以采用字典列表,其中探索列表中每个字典所跨越的网格。
在这种情况下,条件逻辑还算不错,但是对于更复杂的事情来说真的很乏味:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from itertools import product
iris = datasets.load_iris()
duals = [True, False]
penaltys = ['l1', 'l2']
losses = ['hinge', 'squared_hinge']
all_params = list(product(duals, penaltys, losses))
filtered_params = [{'dual': [dual], 'penalty' : [penalty], 'loss': [loss]}
for dual, penalty, loss in all_params
if not (penalty == 'l1' and loss == 'hinge')
and not ((penalty == 'l1' and loss == 'squared_hinge' and dual is True))
and not ((penalty == 'l2' and loss == 'hinge' and dual is False))]
svc = svm.LinearSVC()
clf = GridSearchCV(svc, filtered_params)
clf.fit(iris.data, iris.target)
我想使用 GridSearchCV. However, some combinations of parameters are forbidden by LinearSVC and throw an exception 贪婪地搜索支持向量分类器的整个参数 space。特别是 dual
、penalty
和 loss
参数的互斥组合:
例如,这段代码:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)
Returns ValueError: Unsupported set of arguments: The combination of penalty='l2' and loss='hinge' are not supported when dual=False, Parameters: penalty='l2', loss='hinge', dual=False
我的问题是:是否可以让 GridSearchCV 跳过模型禁止的参数组合?如果没有,有没有一种简单的方法来构造一个不会违反规则的参数space?
我通过将 error_score=0.0
传递给 GridSearchCV
解决了这个问题:
error_score : ‘raise’ (default) or numeric
Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.
更新:较新版本的 sklearn 打印出一堆 ConvergenceWarning
和 FitFailedWarning
。我很难用 contextlib.suppress
压制他们,但是 there is a hack around that 涉及测试上下文管理器:
from sklearn import svm, datasets
from sklearn.utils._testing import ignore_warnings
from sklearn.exceptions import FitFailedWarning, ConvergenceWarning
from sklearn.model_selection import GridSearchCV
with ignore_warnings(category=[ConvergenceWarning, FitFailedWarning]):
iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters, error_score=0.0)
clf.fit(iris.data, iris.target)
如果你想完全避免探索特定的组合(不用等到 运行 出错),你必须自己构建网格。 GridSearchCV 可以采用字典列表,其中探索列表中每个字典所跨越的网格。
在这种情况下,条件逻辑还算不错,但是对于更复杂的事情来说真的很乏味:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from itertools import product
iris = datasets.load_iris()
duals = [True, False]
penaltys = ['l1', 'l2']
losses = ['hinge', 'squared_hinge']
all_params = list(product(duals, penaltys, losses))
filtered_params = [{'dual': [dual], 'penalty' : [penalty], 'loss': [loss]}
for dual, penalty, loss in all_params
if not (penalty == 'l1' and loss == 'hinge')
and not ((penalty == 'l1' and loss == 'squared_hinge' and dual is True))
and not ((penalty == 'l2' and loss == 'hinge' and dual is False))]
svc = svm.LinearSVC()
clf = GridSearchCV(svc, filtered_params)
clf.fit(iris.data, iris.target)