自定义回归器:GridSearchCV 表示从 BaseEsitmator 继承时 'get_params' 未实现

Custom Regressor: GridSearchCV says 'get_params' not implement when inheriting from BaseEsitmator

你好,

谢谢 花时间看这篇文章。

我正在努力实现此 blog post, the data is available here 的 scikit-learn API 版本。我的自定义 class 重现了作者的结果,但不适用于 GridSearchCV。

本质上,他对一些光谱数据实施了偏最小二乘回归,最佳成分数被确定为产生最低 MSE 的成分数。我的尝试如下所示,我能够复制作者的 MSE 结果以获得最佳校准,并且下面 __init__ 的默认参数设置为这些参数。请注意,我继承自 BaseEstiamtorRegressorMixin

#download the .csv from the github repo from the blog post
#Creating df, shuffling, then creating `X` and `y`

df = pd.read_csv("nirpyresearch/data/peach_spectra+brixvalues.csv")
df = df.sample(replace=False, frac=1).copy()
y = df['Brix'].values
X = df[[i for i in list(df.columns) if 'wl' in i]].values
class SavgolPLS(BaseEstimator, RegressorMixin):
    """My Regressor"""
    def __init__(self,  savgol_window = 17, savgol_polyorder = 2, savgol_deriv = 2, pls_components = 7 ):
        self.savgol_window = savgol_window
        self.savgol_polyorder = savgol_polyorder
        self.savgol_deriv = savgol_deriv
        self.pls_components = pls_components

    def fit(self, X, y):

        # Check that X and y have correct shape
        X, y = check_X_y(X, y)


        self.X_ = X
        self.y_ = y
        self.X_savgol_ = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
        self.pls_ = PLSRegression(n_components=self.pls_components).fit(self.X_savgol_, self.y_)
        # Return the classifier
        return self

    def predict(self, X, apply_savgol = True):

        # Check is fit had been called
        #check_is_fitted(self)

        # Input validation
        X = check_array(X)
        if apply_savgol:
            X = savgol_filter(X, self.savgol_window, self.savgol_polyorder, self.savgol_deriv)
        pred_y = self.pls_.predict(X)
        return pred_y

    def score(self, y_pred):
        mse = mean_squared_error( y_true = self.y_, y_pred=y_pred,)
        return mse


我现在可以初始化模型并使用 .get_params() 获取包含 __init__ 中的 4 个参数的字典。

s_pls = SavgolPLS(pls_components=7)
s_pls.get_params()

因此,get_params()似乎是存在的。 . .这是有道理的,因为它是从 BaseEstimator 继承的。我还可以使用 fit() 方法来复制作者的结果。

s_pls = s_pls.fit(X = X, y = y)
y_pred = s_pls.predict(X)

#This should be ~0.6566
s_pls.score(y_pred)

那么,为什么在下面的代码中应用 GridSearchCV 会产生显示的错误?

parameters  ={'savgol_window':[3,30], 'savgol_polyorder':[2,4], 'savgol_deriv':[1,3], 'pls_components':[2,15]}
clf = GridSearchCV(SavgolPLS, parameters, cv = 10)
clf.fit(X, y)

产量

TypeError                                 Traceback (most recent call last)
<ipython-input-22-e20c1eabb4fa> in <module>
----> 1 clf.fit(X, y.ravel())

C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    631         n_splits = cv.get_n_splits(X, y, groups)
    632 
--> 633         base_estimator = clone(self.estimator)
    634 
    635         parallel = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,

C:\tools\Anaconda3\envs\dev_py37_tf\lib\site-packages\sklearn\base.py in clone(estimator, safe)
     58                             "it does not seem to be a scikit-learn estimator "
     59                             "as it does not implement a 'get_params' methods."
---> 60                             % (repr(estimator), type(estimator)))
     61     klass = estimator.__class__
     62     new_object_params = estimator.get_params(deep=False)

TypeError: Cannot clone object '<class '__main__.SavgolPLS'>' (type <class 'type'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

感谢您的帮助!

您正在将 class 传递给 GridSearchCV,您应该传递一个实例:clf = GridSearchCV(SavgolPLS(), parameters, cv = 10)