当 RandomizedSearchCV 的参数 'param_distributions' 被赋予字典列表时,为什么我会收到错误消息?
Why am I getting an error when the parameter 'param_distributions' of RandomizedSearchCV is given a List of Dictionaries?
我可以在 documentation 中看到参数 param_distribution
接受字典或字典列表。当我通过字典时,我的代码在这里工作,但是一旦我通过字典列表,我就会收到错误消息。
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
breast_cancer = load_breast_cancer()
df = pd.DataFrame(load_breast_cancer().data, columns = breast_cancer.feature_names)
df['target'] = pd.Series(load_breast_cancer().target)
df.head()
Xi = df.iloc[:,:-1]
Yi = df.iloc[:,-1]
x_train1, x_test1, y_train1, y_test1 = train_test_split(Xi, Yi, train_size = 0.9)
dist = [{'C': np.random.uniform(34,89,4)}, {"C": np.random.uniform(2, 16, 5)}] # {"C": uniform(4, 97)}
rcv = RandomizedSearchCV(estimator = LogisticRegression(), cv = 5, scoring= 'roc_auc', n_jobs= 5,
param_distributions= dist, n_iter = 10)
rcv.fit(x_train1, y_train1)
输出:
AttributeError Traceback(最后一次调用)
AttributeError: 'list' 对象没有属性 'values'
当我用单个字典替换此 Dict 列表时,我的代码工作正常,例如
dist = {'C': np.random.uniform(34,89,45)}
rcv = RandomizedSearchCV(estimator = LogisticRegression(), cv = 5, scoring= 'roc_auc', n_jobs= 5,
param_distributions= dist, n_iter = 20)
rcv.fit(x_train1, y_train1)
输出:
RandomizedSearchCV(cv=5, error_score='raise-deprecating',
estimator=LogisticRegression(C=1.0, class_weight=None,
dual=False, fit_intercept=True,
intercept_scaling=1,
l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None,
penalty='l2', random_state=None,
solver='warn', tol=0.0001,
verbose=0, warm_start=False),
iid='warn', n_iter=20, n_jobs=5,
param_distributions...
68.32247988, 53.2886396 , 64.71957325, 53.42115708, 66.06577109,
54.09200687, 87.22769322, 81.02240252, 55.25783926, 84.31009298,
71.13884939, 85.74823239, 87.23400718, 54.48527833, 59.49131351,
63.59157499, 38.9348315 , 51.5738502 , 82.72414647, 75.27901268,
42.63960409, 40.65314118, 56.97608301, 66.41059041, 58.37528729])},
pre_dispatch='2*n_jobs', random_state=None, refit=True,
return_train_score=False, scoring='roc_auc', verbose=0)
以上代码适用于@SergeyBushmanov 建议的 sklearn 版本 v0.22.2。
我还必须调整 n_iter
参数的值以避开警告消息。在参数设置为 20 之前,因为那些警告出现了。这些警告也是合法的,因为我有两个超参数 ("C") dist = [{'C': np.random.uniform(34,89,4)}, {"C": np.random.uniform(2, 16, 5)}]
。现在,总共有 4x5=20 个超参数组合要尝试。 n_iter
指定要尝试的组合数。如果 n_iter = 10
,则表示在 20 个中,RandomSearchCV 将尝试随机 10 个超参数值的组合。
我可以在 documentation 中看到参数 param_distribution
接受字典或字典列表。当我通过字典时,我的代码在这里工作,但是一旦我通过字典列表,我就会收到错误消息。
from sklearn.model_selection import train_test_split
from sklearn.model_selection import RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_breast_cancer
import pandas as pd
import numpy as np
breast_cancer = load_breast_cancer()
df = pd.DataFrame(load_breast_cancer().data, columns = breast_cancer.feature_names)
df['target'] = pd.Series(load_breast_cancer().target)
df.head()
Xi = df.iloc[:,:-1]
Yi = df.iloc[:,-1]
x_train1, x_test1, y_train1, y_test1 = train_test_split(Xi, Yi, train_size = 0.9)
dist = [{'C': np.random.uniform(34,89,4)}, {"C": np.random.uniform(2, 16, 5)}] # {"C": uniform(4, 97)}
rcv = RandomizedSearchCV(estimator = LogisticRegression(), cv = 5, scoring= 'roc_auc', n_jobs= 5,
param_distributions= dist, n_iter = 10)
rcv.fit(x_train1, y_train1)
输出:
AttributeError Traceback(最后一次调用)
AttributeError: 'list' 对象没有属性 'values'
当我用单个字典替换此 Dict 列表时,我的代码工作正常,例如
dist = {'C': np.random.uniform(34,89,45)}
rcv = RandomizedSearchCV(estimator = LogisticRegression(), cv = 5, scoring= 'roc_auc', n_jobs= 5,
param_distributions= dist, n_iter = 20)
rcv.fit(x_train1, y_train1)
输出:
RandomizedSearchCV(cv=5, error_score='raise-deprecating',
estimator=LogisticRegression(C=1.0, class_weight=None,
dual=False, fit_intercept=True,
intercept_scaling=1,
l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None,
penalty='l2', random_state=None,
solver='warn', tol=0.0001,
verbose=0, warm_start=False),
iid='warn', n_iter=20, n_jobs=5,
param_distributions...
68.32247988, 53.2886396 , 64.71957325, 53.42115708, 66.06577109,
54.09200687, 87.22769322, 81.02240252, 55.25783926, 84.31009298,
71.13884939, 85.74823239, 87.23400718, 54.48527833, 59.49131351,
63.59157499, 38.9348315 , 51.5738502 , 82.72414647, 75.27901268,
42.63960409, 40.65314118, 56.97608301, 66.41059041, 58.37528729])},
pre_dispatch='2*n_jobs', random_state=None, refit=True,
return_train_score=False, scoring='roc_auc', verbose=0)
以上代码适用于@SergeyBushmanov 建议的 sklearn 版本 v0.22.2。
我还必须调整 n_iter
参数的值以避开警告消息。在参数设置为 20 之前,因为那些警告出现了。这些警告也是合法的,因为我有两个超参数 ("C") dist = [{'C': np.random.uniform(34,89,4)}, {"C": np.random.uniform(2, 16, 5)}]
。现在,总共有 4x5=20 个超参数组合要尝试。 n_iter
指定要尝试的组合数。如果 n_iter = 10
,则表示在 20 个中,RandomSearchCV 将尝试随机 10 个超参数值的组合。