自定义回归器:GridSearchCV 表示从 BaseEsitmator 继承时 'get_params' 未实现
Custom Regressor: GridSearchCV says 'get_params' not implement when inheriting from BaseEsitmator
你好,
谢谢 花时间看这篇文章。
我正在努力实现此 blog post, the data is available here 的 scikit-learn API 版本。我的自定义 class 重现了作者的结果,但不适用于 GridSearchCV。
本质上,他对一些光谱数据实施了偏最小二乘回归,最佳成分数被确定为产生最低 MSE 的成分数。我的尝试如下所示,我能够复制作者的 MSE 结果以获得最佳校准,并且下面 __init__
的默认参数设置为这些参数。请注意,我继承自 BaseEstiamtor
和 RegressorMixin
。
#download the .csv from the github repo from the blog post
#Creating df, shuffling, then creating `X` and `y`
df = pd.read_csv("nirpyresearch/data/peach_spectra+brixvalues.csv")
df = df.sample(replace=False, frac=1).copy()
y = df['Brix'].values
X = df[[i for i in list(df.columns) if 'wl' in i]].values
class SavgolPLS(BaseEstimator, RegressorMixin):
"""My Regressor"""
def __init__(self, savgol_window = 17, savgol_polyorder = 2, savgol_deriv = 2, pls_components = 7 ):
self.savgol_window = savgol_window
self.savgol_polyorder = savgol_polyorder
self.savgol_deriv = savgol_deriv
self.pls_components = pls_components
def fit(self, X, y):
# Check that X and y have correct shape
X, y = check_X_y(X, y)
self.X_ = X
self.y_ = y
self.X_savgol_ = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
self.pls_ = PLSRegression(n_components=self.pls_components).fit(self.X_savgol_, self.y_)
# Return the classifier
return self
def predict(self, X, apply_savgol = True):
# Check is fit had been called
#check_is_fitted(self)
# Input validation
X = check_array(X)
if apply_savgol:
X = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
pred_y = self.pls_.predict(X)
return pred_y
def score(self, y_pred):
mse = mean_squared_error( y_true = self.y_, y_pred=y_pred,)
return mse
我现在可以初始化模型并使用 .get_params()
获取包含 __init__
中的 4 个参数的字典。
s_pls = SavgolPLS(pls_components=7)
s_pls.get_params()
因此,get_params()
似乎是存在的。 . .这是有道理的,因为它是从 BaseEstimator 继承的。我还可以使用 fit()
方法来复制作者的结果。
s_pls = s_pls.fit(X = X, y = y)
y_pred = s_pls.predict(X)
#This should be ~0.6566
s_pls.score(y_pred)
那么,为什么在下面的代码中应用 GridSearchCV 会产生显示的错误?
parameters ={'savgol_window':[3,30], 'savgol_polyorder':[2,4], 'savgol_deriv':[1,3], 'pls_components':[2,15]}
clf = GridSearchCV(SavgolPLS, parameters, cv = 10)
clf.fit(X, y)
产量
TypeError Traceback (most recent call last)
<ipython-input-22-e20c1eabb4fa> in <module>
----> 1 clf.fit(X, y.ravel())
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
631 n_splits = cv.get_n_splits(X, y, groups)
632
--> 633 base_estimator = clone(self.estimator)
634
635 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\base.py in clone(estimator, safe)
58 "it does not seem to be a scikit-learn estimator "
59 "as it does not implement a 'get_params' methods."
---> 60 % (repr(estimator), type(estimator)))
61 klass = estimator.__class__
62 new_object_params = estimator.get_params(deep=False)
TypeError: Cannot clone object '<class '__main__.SavgolPLS'>' (type <class 'type'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
感谢您的帮助!
您正在将 class 传递给 GridSearchCV,您应该传递一个实例:clf = GridSearchCV(SavgolPLS(), parameters, cv = 10)
你好,
谢谢 花时间看这篇文章。
我正在努力实现此 blog post, the data is available here 的 scikit-learn API 版本。我的自定义 class 重现了作者的结果,但不适用于 GridSearchCV。
本质上,他对一些光谱数据实施了偏最小二乘回归,最佳成分数被确定为产生最低 MSE 的成分数。我的尝试如下所示,我能够复制作者的 MSE 结果以获得最佳校准,并且下面 __init__
的默认参数设置为这些参数。请注意,我继承自 BaseEstiamtor
和 RegressorMixin
。
#download the .csv from the github repo from the blog post
#Creating df, shuffling, then creating `X` and `y`
df = pd.read_csv("nirpyresearch/data/peach_spectra+brixvalues.csv")
df = df.sample(replace=False, frac=1).copy()
y = df['Brix'].values
X = df[[i for i in list(df.columns) if 'wl' in i]].values
class SavgolPLS(BaseEstimator, RegressorMixin):
"""My Regressor"""
def __init__(self, savgol_window = 17, savgol_polyorder = 2, savgol_deriv = 2, pls_components = 7 ):
self.savgol_window = savgol_window
self.savgol_polyorder = savgol_polyorder
self.savgol_deriv = savgol_deriv
self.pls_components = pls_components
def fit(self, X, y):
# Check that X and y have correct shape
X, y = check_X_y(X, y)
self.X_ = X
self.y_ = y
self.X_savgol_ = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
self.pls_ = PLSRegression(n_components=self.pls_components).fit(self.X_savgol_, self.y_)
# Return the classifier
return self
def predict(self, X, apply_savgol = True):
# Check is fit had been called
#check_is_fitted(self)
# Input validation
X = check_array(X)
if apply_savgol:
X = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
pred_y = self.pls_.predict(X)
return pred_y
def score(self, y_pred):
mse = mean_squared_error( y_true = self.y_, y_pred=y_pred,)
return mse
我现在可以初始化模型并使用 .get_params()
获取包含 __init__
中的 4 个参数的字典。
s_pls = SavgolPLS(pls_components=7)
s_pls.get_params()
因此,get_params()
似乎是存在的。 . .这是有道理的,因为它是从 BaseEstimator 继承的。我还可以使用 fit()
方法来复制作者的结果。
s_pls = s_pls.fit(X = X, y = y)
y_pred = s_pls.predict(X)
#This should be ~0.6566
s_pls.score(y_pred)
那么,为什么在下面的代码中应用 GridSearchCV 会产生显示的错误?
parameters ={'savgol_window':[3,30], 'savgol_polyorder':[2,4], 'savgol_deriv':[1,3], 'pls_components':[2,15]}
clf = GridSearchCV(SavgolPLS, parameters, cv = 10)
clf.fit(X, y)
产量
TypeError Traceback (most recent call last)
<ipython-input-22-e20c1eabb4fa> in <module>
----> 1 clf.fit(X, y.ravel())
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
631 n_splits = cv.get_n_splits(X, y, groups)
632
--> 633 base_estimator = clone(self.estimator)
634
635 parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\base.py in clone(estimator, safe)
58 "it does not seem to be a scikit-learn estimator "
59 "as it does not implement a 'get_params' methods."
---> 60 % (repr(estimator), type(estimator)))
61 klass = estimator.__class__
62 new_object_params = estimator.get_params(deep=False)
TypeError: Cannot clone object '<class '__main__.SavgolPLS'>' (type <class 'type'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.
感谢您的帮助!
您正在将 class 传递给 GridSearchCV,您应该传递一个实例:clf = GridSearchCV(SavgolPLS(), parameters, cv = 10)