在 `sklearn` 管道中访问转换器函数

Question

The pipeline has all the methods that the last estimator in the pipeline has, i.e. if the last estimator is a classifier, the Pipeline can be used as a classifier. If the last estimator is a transformer, again, so is the pipeline.

以下示例创建了一个带有自定义虚拟函数的虚拟转换器 f:

class C:
    def fit(self, X, y=None):
        print('fit')
        return self
    def transform(self, X):
        print('transform')
        return X

    def f(self):
        print('abc')

from sklearn.pipeline import Pipeline
ppl = Pipeline([('C', C())])

我原本希望能够访问 C 转换器的 f 函数，但是调用 ppl.f() 结果是 AttributeError: 'Pipeline' object has no attribute 'f'

我是不是误解了文档？有没有一种好的可靠的方法来访问最后一个变压器的功能？

Answer 1

Pipeline 文档稍微夸大了一些事情。它具有其最后一个估计器的所有 estimator 方法。其中包括 predict(), fit_predict(), fit_transform(), transform(), decision_function(), predict_proba()... 之类的内容。

它不能使用任何其他功能，因为它不知道如何处理管道中的所有其他步骤。对于大多数情况，您传递 (X) 或可能 (X,y)，并且 X and/or y 必须使用 fit_transform() 或 transform() 通过管道中的每个链。

访问最后一个估算器相当容易，如下所示：

ppl.steps[-1][1].f()

但请记住，这样做会绕过管道中的前面步骤（即，如果您通过它 X，它不会与您的 StandardScaler 或您在管道中前面所做的任何事情一起缩放。 )

在 `sklearn` 管道中访问转换器函数

Accessing transformer functions in `sklearn` pipelines

python

scikit-learn