为 onehotencoded 变量创建管道不起作用
creating a pipeline for onehotencoded variables not working
我在尝试将转换应用到我的分类特征 'country' 和我的其余数字列时遇到了问题。我在下面尝试时该怎么做:
preprocess = make_column_transformer(
(numeric_cols, make_pipeline(MinMaxScaler())),
(categorical_cols, OneHotEncoder()))
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X_train, y_train)
请注意 numeric_cols 作为列表传递,categorical_cols 也是如此。
然而 i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers.
以及我所有数字列的列表 (type <class 'list'>) doesn't.
我做错了什么,我该如何处理国家/地区列中看不见的类别?
您需要先放置转换函数,然后将列作为后续参数,如果您查看帮助页面,它写道:
sklearn.compose.make_column_transformer(*transformers, **kwargs)
像下面这样的一些会起作用:
from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier
import numpy as np
import pandas as pd
X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
'x2':np.random.choice(['A','B'],5)})
y = pd.Series(np.random.choice(['0','1'],5))
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
preprocess = make_column_transformer(
(MinMaxScaler(),numeric_cols),
(OneHotEncoder(),categorical_cols)
)
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('minmaxscaler',
MinMaxScaler(), ['x1']),
('onehotencoder',
OneHotEncoder(), ['x2'])])),
('xgbclassifier', XGBClassifier())])
我在尝试将转换应用到我的分类特征 'country' 和我的其余数字列时遇到了问题。我在下面尝试时该怎么做:
preprocess = make_column_transformer(
(numeric_cols, make_pipeline(MinMaxScaler())),
(categorical_cols, OneHotEncoder()))
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X_train, y_train)
请注意 numeric_cols 作为列表传递,categorical_cols 也是如此。
然而 i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers.
以及我所有数字列的列表 (type <class 'list'>) doesn't.
我做错了什么,我该如何处理国家/地区列中看不见的类别?
您需要先放置转换函数,然后将列作为后续参数,如果您查看帮助页面,它写道:
sklearn.compose.make_column_transformer(*transformers, **kwargs)
像下面这样的一些会起作用:
from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier
import numpy as np
import pandas as pd
X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
'x2':np.random.choice(['A','B'],5)})
y = pd.Series(np.random.choice(['0','1'],5))
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
preprocess = make_column_transformer(
(MinMaxScaler(),numeric_cols),
(OneHotEncoder(),categorical_cols)
)
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('minmaxscaler',
MinMaxScaler(), ['x1']),
('onehotencoder',
OneHotEncoder(), ['x2'])])),
('xgbclassifier', XGBClassifier())])