scikit 管道中的链接转换
Chaining transformations in scikit pipeline
我正在使用 scikit 管道在数据集上创建预处理。我有一个包含四个变量的数据集:['monetary', 'frequency1', 'frequency2', 'recency']
,我想预处理除 recency
之外的所有变量。预处理,我想先获取日志,然后进行标准化。但是,当我从管道中获取转换后的数据时,我得到了 7 列(3 列日志、3 列标准化、新近度)。有没有一种方法可以链接转换,这样我就可以获取日志并在日志执行标准化之后只获得一个 4 特征数据集?
def create_pipeline(df):
all_but_recency = ['monetary', 'frequency1','frequency2']
# Preprocess
preprocessor = ColumnTransformer(
transformers=[
( 'log', FunctionTransformer(np.log), all_but_recency ),
( 'standardize', preprocessing.StandardScaler(), all_but_recency ) ],
remainder='passthrough')
# Pipeline
estimators = [( 'preprocess', preprocessor )]
pipe = Pipeline(steps=estimators)
print(pipe.set_params().fit_transform(df).shape)
提前致谢
您必须按顺序应用 FunctionTransformer
。试试这个!
def create_pipeline(df):
all_but_recency = ['monetary', 'frequency1','frequency2']
# Preprocess
# Preprocess
preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_recency)],'passthrough')
preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_recency)],'passthrough' )
# Pipeline
estimators = [('preprocess1', preprocessor1),('standardize', preprocessor2)]
pipe = Pipeline(steps=estimators)
print(pipe.set_params().fit_transform(df).shape)
工作示例
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
iris = load_iris()
X, y = iris.data, iris.target
df= pd.DataFrame(X,columns = iris.feature_names)
all_but_one = [0,1,2]
# Preprocess
preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_one)],'passthrough')
preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_one)],'passthrough' )
# Pipeline
estimators = [('preprocess1', preprocessor1),('scalling', preprocessor2)]
pipe = Pipeline(steps=estimators,)
pipe.fit_transform(df)
我正在使用 scikit 管道在数据集上创建预处理。我有一个包含四个变量的数据集:['monetary', 'frequency1', 'frequency2', 'recency']
,我想预处理除 recency
之外的所有变量。预处理,我想先获取日志,然后进行标准化。但是,当我从管道中获取转换后的数据时,我得到了 7 列(3 列日志、3 列标准化、新近度)。有没有一种方法可以链接转换,这样我就可以获取日志并在日志执行标准化之后只获得一个 4 特征数据集?
def create_pipeline(df):
all_but_recency = ['monetary', 'frequency1','frequency2']
# Preprocess
preprocessor = ColumnTransformer(
transformers=[
( 'log', FunctionTransformer(np.log), all_but_recency ),
( 'standardize', preprocessing.StandardScaler(), all_but_recency ) ],
remainder='passthrough')
# Pipeline
estimators = [( 'preprocess', preprocessor )]
pipe = Pipeline(steps=estimators)
print(pipe.set_params().fit_transform(df).shape)
提前致谢
您必须按顺序应用 FunctionTransformer
。试试这个!
def create_pipeline(df):
all_but_recency = ['monetary', 'frequency1','frequency2']
# Preprocess
# Preprocess
preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_recency)],'passthrough')
preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_recency)],'passthrough' )
# Pipeline
estimators = [('preprocess1', preprocessor1),('standardize', preprocessor2)]
pipe = Pipeline(steps=estimators)
print(pipe.set_params().fit_transform(df).shape)
工作示例
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn import preprocessing
iris = load_iris()
X, y = iris.data, iris.target
df= pd.DataFrame(X,columns = iris.feature_names)
all_but_one = [0,1,2]
# Preprocess
preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_one)],'passthrough')
preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_one)],'passthrough' )
# Pipeline
estimators = [('preprocess1', preprocessor1),('scalling', preprocessor2)]
pipe = Pipeline(steps=estimators,)
pipe.fit_transform(df)