具有多个估计器的 Sklearn 管道

Question

链接估算器并尝试查看时遇到错误。我是 Python 的新手，这是我第一次尝试此管道功能。

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
from sklearn.decomposition import PCA

estimator=[('dim_reduction',PCA()),('logres_model',LogisticRegression()),('linear_model',LinearRegression())]

pipeline_estimator=Pipeline(estimator)

错误信息

TypeError                                 Traceback (most recent call last)
<ipython-input-196-44549764413a> in <module>
----> 1 pipeline_estimator=Pipeline(estimator)

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     71                           FutureWarning)
     72         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73         return f(**kwargs)
     74     return inner_f
     75 

D:\Anaconda\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose)
    112         self.memory = memory
    113         self.verbose = verbose
--> 114         self._validate_steps()
    115 
    116     def get_params(self, deep=True):

D:\Anaconda\lib\site-packages\sklearn\pipeline.py in _validate_steps(self)
    157             if (not (hasattr(t, "fit") or hasattr(t, "fit_transform")) or not
    158                     hasattr(t, "transform")):
--> 159                 raise TypeError("All intermediate steps should be "
    160                                 "transformers and implement fit and transform "
    161                                 "or be the string 'passthrough' "

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression()' (type <class 'sklearn.linear_model._logistic.LogisticRegression'>) doesn't

Answer 1

正如错误提示的那样，Pileline 中的所有中间步骤都必须是转换器（用于特征转换）并且具有 fit/transform 方法，但是您已经链接了两个模型。你应该只有一个，并且在管道的末端。

看起来您可能想要执行网格搜索，比较两个估计器，以及它们相应的管道和超参数调整。对于那个使用 GridSearchCV，定义的 Pipeline 作为估算器：

from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.datasets import load_iris

pipeline = Pipeline([
    ('dim_reduction', PCA()),
    ('clf', LogisticRegression()),
])
parameters = [
    {
        'clf': (LogisticRegression(),),
        'clf__C': (0.001,0.01,0.1,1,10,100)
    }, {
        'clf': (RandomForestClassifier(),),
        'clf__n_estimators': (10, 30),
    }
]
grid_search = GridSearchCV(pipeline, parameters)

# some example dataset
X, y = load_iris(return_X_y=True)
X_train, X_tes, y_train, y_test = train_test_split(X, y)
grid_search.fit(X_train, y_train)

另请注意，您正在混合使用分类器和回归器。上面显示了如何通过组合两个示例分类器来做到这一点。尽管您可能想花一些时间来了解您面临的是哪种问题，以及哪些模型适合它。

具有多个估计器的 Sklearn 管道

Sklearn Pipeline with multiple estimators

python

pipeline

transform

machine-learning

scikit-learn