为 onehotencoded 变量创建管道不起作用

Question

我在尝试将转换应用到我的分类特征 'country' 和我的其余数字列时遇到了问题。我在下面尝试时该怎么做：

preprocess = make_column_transformer(
    (numeric_cols, make_pipeline(MinMaxScaler())),
    (categorical_cols, OneHotEncoder()))

model = make_pipeline(preprocess,XGBClassifier())

model.fit(X_train, y_train)

请注意 numeric_cols 作为列表传递，categorical_cols 也是如此。

然而 i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. 以及我所有数字列的列表 (type <class 'list'>) doesn't.

我做错了什么，我该如何处理国家/地区列中看不见的类别？

Answer 1

您需要先放置转换函数，然后将列作为后续参数，如果您查看帮助页面，它写道：

sklearn.compose.make_column_transformer(*transformers, **kwargs)

像下面这样的一些会起作用：

from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

from xgboost import XGBClassifier

import numpy as np
import pandas as pd

X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
                   'x2':np.random.choice(['A','B'],5)})

y = pd.Series(np.random.choice(['0','1'],5))
 
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
    
preprocess = make_column_transformer(
    (MinMaxScaler(),numeric_cols),
    (OneHotEncoder(),categorical_cols)
    )

model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('minmaxscaler',
                                                  MinMaxScaler(), ['x1']),
                                                 ('onehotencoder',
                                                  OneHotEncoder(), ['x2'])])),
                ('xgbclassifier', XGBClassifier())])

为 onehotencoded 变量创建管道不起作用

creating a pipeline for onehotencoded variables not working

python

pipeline

pandas

scikit-learn

one-hot-encoding