当最后一个估算器不是转换器时，如何使用 scikit-learn Pipeline 进行转换？

Question

我有一个管道，我想执行预处理和特征工程步骤，但我不能使用 fit_transform()，因为 RandomForestClassifier() 没有这样的方法。

我试过使用管道的 _fit() 方法（因为这是 fit() 方法使用的方法）但是这在我的转换器中出现了 KeyError。

下面是管道：

# pipeline transformations
_pipe = Pipeline(
    [
        (
            "most_frequent_imputer",
            MostFrequentImputer(features=config.model_config.impute_most_freq_cols),
        ),
        (
            "aggregate_high_cardinality_features",
            AggregateCategorical(features=config.model_config.high_cardinality_cats),
        ),
        (
            "get_categorical_codes",
            CategoryConverter(features=config.model_config.convert_to_category_codes),
        ),
        (
            "mean_imputer",
            MeanImputer(features=config.model_config.continuous_features),
        ),
        (
            "random_forest",
            RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=25),
        ),
    ]
)

Answer 1

您可以执行以下操作：

_pipe[:-1].fit_transform(X)

这基本上 select 除了最后一个步骤之外的所有步骤，因此您可以执行 fit_transform()。需要注意的是预处理步骤会拟合

当最后一个估算器不是转换器时，如何使用 scikit-learn Pipeline 进行转换？

How can I transform with scikit-learn Pipeline when the last estimator is not a transformer?

python

machine-learning

scikit-learn

data-science