使用 GridSearchCV 时跳过禁止的参数组合

Question

我想使用 GridSearchCV. However, some combinations of parameters are forbidden by LinearSVC and throw an exception 贪婪地搜索支持向量分类器的整个参数 space。特别是 dual、penalty 和 loss 参数的互斥组合：

例如，这段代码：

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \
              'loss': ['hinge', 'squared_hinge']}
svc = svm.LinearSVC()
clf = GridSearchCV(svc, parameters)
clf.fit(iris.data, iris.target)

Returns ValueError: Unsupported set of arguments: The combination of penalty='l2' and loss='hinge' are not supported when dual=False, Parameters: penalty='l2', loss='hinge', dual=False

我的问题是：是否可以让 GridSearchCV 跳过模型禁止的参数组合？如果没有，有没有一种简单的方法来构造一个不会违反规则的参数space？

Answer 1

我通过将 error_score=0.0 传递给 GridSearchCV 解决了这个问题：

error_score : ‘raise’ (default) or numeric

Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error.

更新：较新版本的 sklearn 打印出一堆 ConvergenceWarning 和 FitFailedWarning。我很难用 contextlib.suppress 压制他们，但是 there is a hack around that 涉及测试上下文管理器：

from sklearn import svm, datasets 
from sklearn.utils._testing import ignore_warnings 
from sklearn.exceptions import FitFailedWarning, ConvergenceWarning 
from sklearn.model_selection import GridSearchCV 

with ignore_warnings(category=[ConvergenceWarning, FitFailedWarning]): 
    iris = datasets.load_iris() 
    parameters = {'dual':[True, False], 'penalty' : ['l1', 'l2'], \ 
                 'loss': ['hinge', 'squared_hinge']} 
    svc = svm.LinearSVC() 
    clf = GridSearchCV(svc, parameters, error_score=0.0) 
    clf.fit(iris.data, iris.target)

Answer 2

如果你想完全避免探索特定的组合（不用等到运行出错），你必须自己构建网格。 GridSearchCV 可以采用字典列表，其中探索列表中每个字典所跨越的网格。

在这种情况下，条件逻辑还算不错，但是对于更复杂的事情来说真的很乏味：

from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from itertools import product

iris = datasets.load_iris()

duals = [True, False]
penaltys = ['l1', 'l2']
losses = ['hinge', 'squared_hinge']
all_params = list(product(duals, penaltys, losses))
filtered_params = [{'dual': [dual], 'penalty' : [penalty], 'loss': [loss]}
                   for dual, penalty, loss in all_params
                   if not (penalty == 'l1' and loss == 'hinge') 
                   and not ((penalty == 'l1' and loss == 'squared_hinge' and dual is True))
                  and not ((penalty == 'l2' and loss == 'hinge' and dual is False))]

svc = svm.LinearSVC()
clf = GridSearchCV(svc, filtered_params)
clf.fit(iris.data, iris.target)

使用 GridSearchCV 时跳过禁止的参数组合

Skip forbidden parameter combinations when using GridSearchCV

python

optimization

svc

scikit-learn

grid-search