将两个拟合的估计器组合成一个管道

Question

我有两个阶段的数据：

import numpy as np

data_pre = np.array([[1., 2., 203.],
                     [0.5, np.nan, 208.]])

data_post = np.array([[2., 2., 203.],
                      [0.5, 2., 208.]])

我还有两个预先存在的拟合估计量：

from sklearn.preprocessing import Imputer
from sklearn.ensemble import GradientBoostingRegressor

imp = Imputer(missing_values=np.nan, strategy='mean', axis=1).fit(data_pre)
gbm = GradientBoostingRegressor().fit(data_post[:,:2], data_post[:,2])

我需要将一个合适的管道和 data_pre 传递给另一个函数。

def the_function_i_need(estimators):
    """
    """
    return fitted pipeline

fitted_pipeline = the_function_i_need([imp, gbm])
sweet_output = static_function(fitted_pipeline, data_pre)

有没有一种方法可以将这两个现有的和拟合的模型对象组合成一个拟合的管道而不需要重新拟合模型，或者我运气不好？

Answer 1

我试着调查了一下。我找不到任何直接的方法来做到这一点。

我觉得唯一的方法是编写一个 Custom Transformer，它作为现有 Imputer 和 GradientBoostingRegressor 的包装器。您可以使用已经安装好的 Regressor and/or Imputer 初始化包装器。然后，您可以通过不执行任何操作来覆盖对 fit 的调用。在所有后续 transform 调用中，您可以调用基础拟合模型的 transform。这是一种肮脏的做法，除非这对您的应用程序非常重要，否则不应这样做。可以找到有关为 Scikit-Learn 管道编写自定义类的好教程 here. Another working example of custom pipeline objects from scikit-learn's documentation can be found here。

Answer 2

...几年后。使用 make_pipeline() 将 scikit-learn 估计器连接为：

new_model = make_pipeline(fitted_preprocessor,
                          fitted_model)

将两个拟合的估计器组合成一个管道

Combine two fitted estimators into a pipeline

pipeline

scikit-learn