使用自定义转换器组合 StandardScaler

Composing StandardScaler with a custom transformer

我有一个自定义转换器:

from sklearn.base import BaseEstimator, TransformerMixin
from datetime import datetime

class DayOfYearTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self  # nothing else to do
    def transform(self, X):
        date_array = X['date']
        day_of_year = [datetime.strptime(date_str, '%Y-%m-%d').date().timetuple().tm_yday for date_str in date_array]
        return np.c_[X.copy().drop(columns=['date']), day_of_year]

它将日期字符串转换为一年中的第几天,即 1 到 365 之间的数字。这是我的管道:

from sklearn.compose import ColumnTransformer

num_attribs = []
cat_attribs = ['country', 'store', 'product']
date_attribs = ['date']

full_pipeline = ColumnTransformer([
    ("cat", OneHotEncoder(), cat_attribs),
    ("date", DayOfYearTransformer(), date_attribs),
])

merch_prepared = full_pipeline.fit_transform(merch)

我想对

的输出应用 sklearn.preprocessing.StandardScaler 转换
    ("date", DayOfYearTransformer(), date_attribs),

我该怎么做?

您可以使用首先执行 DayOfYearTransformer 然后执行 StandardScaler 的管道将两者结合起来。在读起来像

的代码中
from sklearn.pipeline import make_pipeline
full_pipeline = ColumnTransformer([
    ("cat", OneHotEncoder(), cat_attribs),
    ("date", make_pipeline(DayOfYearTransformer(), StandardScaler()), date_attribs),
])

我使用便利函数 make_pipeline 构建了一个 sklearn Pipeline 对象。