使用自定义转换器组合 StandardScaler
Composing StandardScaler with a custom transformer
我有一个自定义转换器:
from sklearn.base import BaseEstimator, TransformerMixin
from datetime import datetime
class DayOfYearTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self # nothing else to do
def transform(self, X):
date_array = X['date']
day_of_year = [datetime.strptime(date_str, '%Y-%m-%d').date().timetuple().tm_yday for date_str in date_array]
return np.c_[X.copy().drop(columns=['date']), day_of_year]
它将日期字符串转换为一年中的第几天,即 1 到 365 之间的数字。这是我的管道:
from sklearn.compose import ColumnTransformer
num_attribs = []
cat_attribs = ['country', 'store', 'product']
date_attribs = ['date']
full_pipeline = ColumnTransformer([
("cat", OneHotEncoder(), cat_attribs),
("date", DayOfYearTransformer(), date_attribs),
])
merch_prepared = full_pipeline.fit_transform(merch)
我想对
的输出应用 sklearn.preprocessing.StandardScaler 转换
("date", DayOfYearTransformer(), date_attribs),
我该怎么做?
您可以使用首先执行 DayOfYearTransformer
然后执行 StandardScaler
的管道将两者结合起来。在读起来像
的代码中
from sklearn.pipeline import make_pipeline
full_pipeline = ColumnTransformer([
("cat", OneHotEncoder(), cat_attribs),
("date", make_pipeline(DayOfYearTransformer(), StandardScaler()), date_attribs),
])
我使用便利函数 make_pipeline 构建了一个 sklearn Pipeline 对象。
我有一个自定义转换器:
from sklearn.base import BaseEstimator, TransformerMixin
from datetime import datetime
class DayOfYearTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self # nothing else to do
def transform(self, X):
date_array = X['date']
day_of_year = [datetime.strptime(date_str, '%Y-%m-%d').date().timetuple().tm_yday for date_str in date_array]
return np.c_[X.copy().drop(columns=['date']), day_of_year]
它将日期字符串转换为一年中的第几天,即 1 到 365 之间的数字。这是我的管道:
from sklearn.compose import ColumnTransformer
num_attribs = []
cat_attribs = ['country', 'store', 'product']
date_attribs = ['date']
full_pipeline = ColumnTransformer([
("cat", OneHotEncoder(), cat_attribs),
("date", DayOfYearTransformer(), date_attribs),
])
merch_prepared = full_pipeline.fit_transform(merch)
我想对
的输出应用 sklearn.preprocessing.StandardScaler 转换 ("date", DayOfYearTransformer(), date_attribs),
我该怎么做?
您可以使用首先执行 DayOfYearTransformer
然后执行 StandardScaler
的管道将两者结合起来。在读起来像
from sklearn.pipeline import make_pipeline
full_pipeline = ColumnTransformer([
("cat", OneHotEncoder(), cat_attribs),
("date", make_pipeline(DayOfYearTransformer(), StandardScaler()), date_attribs),
])
我使用便利函数 make_pipeline 构建了一个 sklearn Pipeline 对象。