决策树回归器中的网格交叉验证问题
Issue with grid cross validation in decision tree regressor
假设我已经定义了这样的回归器
tree = MultiOutputRegressor(DecisionTreeRegressor(random_state=0))
tree.fit(X_train, y_train)
现在我想做一个网格交叉验证来优化参数ccp_alpha
(我不知道它是否是优化的最佳参数,但我以它为例)。因此我是这样做的:
alphas = np.arange(0,2,0.1)
pipe_tree = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('tree', tree)], memory = 'tmp')
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__ccp_alpha=alphas ), cv=5, scoring ='r2', n_jobs=-1)
start_time = time.time()
treeCV.fit(X_train, y_train)
问题是我拿这个问题:
ValueError: Invalid parameter ccp_alpha for estimator Pipeline(memory='tmp',
steps=[('scaler', StandardScaler()), ('pca', PCA()),
('tree',
MultiOutputRegressor(estimator=DecisionTreeRegressor(random_state=0)))]). Check the list of available parameters with `estimator.get_params().keys()`.
如果我使用命令 tree.get_params().keys()
,它会打印一个可能的参数列表,以便在我的模型中进行更改。我认为问题出在 GridSearchCV()
命令中的 tree__ccp_alpha=alphas
。但是无论我做什么改变,它都不起作用。
我不确定你的 post 中的 tree
是什么,但它似乎是你决策树顶部的多元回归器。如果你设置正确它应该工作。首先我们定义参数:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
import numpy as np
alphas = np.arange(0,2,0.1)
n_components = [3,4,5]
然后设置步骤:
scaler = StandardScaler()
pca = PCA()
from sklearn.multioutput import MultiOutputRegressor
tree = MultiOutputRegressor(DecisionTreeClassifier())
玩具数据:
X_train = np.random.normal(0,1,(100,10))
y_train = np.random.binomial(1,0.5,(100,2))
管道:
pipe_tree = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('tree', tree)])
tree.get_params()
{'estimator__ccp_alpha': 0.0,
'estimator__class_weight': None,
'estimator__criterion': 'gini',
'estimator__max_depth': None,
'estimator__max_features': None,
'estimator__max_leaf_nodes': None,
'estimator__min_impurity_decrease': 0.0,
'estimator__min_impurity_split': None,
'estimator__min_samples_leaf': 1,
'estimator__min_samples_split': 2,
'estimator__min_weight_fraction_leaf': 0.0,
'estimator__presort': 'deprecated',
'estimator__random_state': None,
'estimator__splitter': 'best',
'estimator': DecisionTreeClassifier(),
'n_jobs': None}
参数应该是estimator__ccp_alpha
。因此,如果我们在它之前附加 tree
,使用 tree__estimator__ccp_alpha = alphas
就可以了:
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__estimator__ccp_alpha=alphas ),
cv=5, scoring ='r2', n_jobs=-1)
treeCV.fit(X_train, y_train)
如果我用你的:
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__ccp_alpha=alphas ),
cv=5, scoring ='r2', n_jobs=-1)
我得到同样的错误
假设我已经定义了这样的回归器
tree = MultiOutputRegressor(DecisionTreeRegressor(random_state=0))
tree.fit(X_train, y_train)
现在我想做一个网格交叉验证来优化参数ccp_alpha
(我不知道它是否是优化的最佳参数,但我以它为例)。因此我是这样做的:
alphas = np.arange(0,2,0.1)
pipe_tree = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('tree', tree)], memory = 'tmp')
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__ccp_alpha=alphas ), cv=5, scoring ='r2', n_jobs=-1)
start_time = time.time()
treeCV.fit(X_train, y_train)
问题是我拿这个问题:
ValueError: Invalid parameter ccp_alpha for estimator Pipeline(memory='tmp',
steps=[('scaler', StandardScaler()), ('pca', PCA()),
('tree',
MultiOutputRegressor(estimator=DecisionTreeRegressor(random_state=0)))]). Check the list of available parameters with `estimator.get_params().keys()`.
如果我使用命令 tree.get_params().keys()
,它会打印一个可能的参数列表,以便在我的模型中进行更改。我认为问题出在 GridSearchCV()
命令中的 tree__ccp_alpha=alphas
。但是无论我做什么改变,它都不起作用。
我不确定你的 post 中的 tree
是什么,但它似乎是你决策树顶部的多元回归器。如果你设置正确它应该工作。首先我们定义参数:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
from sklearn.tree import DecisionTreeClassifier
import numpy as np
alphas = np.arange(0,2,0.1)
n_components = [3,4,5]
然后设置步骤:
scaler = StandardScaler()
pca = PCA()
from sklearn.multioutput import MultiOutputRegressor
tree = MultiOutputRegressor(DecisionTreeClassifier())
玩具数据:
X_train = np.random.normal(0,1,(100,10))
y_train = np.random.binomial(1,0.5,(100,2))
管道:
pipe_tree = Pipeline(steps=[('scaler', scaler), ('pca', pca), ('tree', tree)])
tree.get_params()
{'estimator__ccp_alpha': 0.0,
'estimator__class_weight': None,
'estimator__criterion': 'gini',
'estimator__max_depth': None,
'estimator__max_features': None,
'estimator__max_leaf_nodes': None,
'estimator__min_impurity_decrease': 0.0,
'estimator__min_impurity_split': None,
'estimator__min_samples_leaf': 1,
'estimator__min_samples_split': 2,
'estimator__min_weight_fraction_leaf': 0.0,
'estimator__presort': 'deprecated',
'estimator__random_state': None,
'estimator__splitter': 'best',
'estimator': DecisionTreeClassifier(),
'n_jobs': None}
参数应该是estimator__ccp_alpha
。因此,如果我们在它之前附加 tree
,使用 tree__estimator__ccp_alpha = alphas
就可以了:
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__estimator__ccp_alpha=alphas ),
cv=5, scoring ='r2', n_jobs=-1)
treeCV.fit(X_train, y_train)
如果我用你的:
treeCV = GridSearchCV(pipe_tree, dict( pca__n_components=n_components, tree__ccp_alpha=alphas ),
cv=5, scoring ='r2', n_jobs=-1)
我得到同样的错误