一次热编码丢失特征名
Loss of feature names when onehotencoding
使用 onehotencoding 构建管道,当拟合和转换为 training/test 集并转换为数据框时,会导致特征没有名称。有什么方法可以获取每个编码特征的名称吗?
# Numerical column transformer
num_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
])
# Categorical column transformer
cat_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Preprocessing pipeline
preprocessor = ColumnTransformer(
transformers=[
('num', num_transformer, numerical_cols),
('cat', cat_transformer, categorical_cols)
])
# Fitting the data and transforming the training & test set
X_train_preprocessed = preprocessor.fit_transform(X_train)
test_preprocessed = preprocessor.fit_transform(test)
您可以使用 Pipeline
的 named_transformers_
of ColumnTransformer
. You have 2 transformers named 'num'
and 'cat'
, so preprocessor.named_transformers_['cat']
gives you access to your cat_transformer
. Then using named_steps
属性访问转换器 您可以访问名为 'onehot'
的 OneHotEncoder
及其 categories_
属性:
X = [['Male', 1], ['Female', 3], ['Female', 2]]
preprocessor.fit_transform(X)
Out[6]:
array([[-1.22474487, 0. , 1. ],
[ 1.22474487, 1. , 0. ],
[ 0. , 1. , 0. ]])
preprocessor.named_transformers_['cat'].named_steps['onehot'].categories_
Out[7]: [array(['Female', 'Male'], dtype=object)]
使用 onehotencoding 构建管道,当拟合和转换为 training/test 集并转换为数据框时,会导致特征没有名称。有什么方法可以获取每个编码特征的名称吗?
# Numerical column transformer
num_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler())
])
# Categorical column transformer
cat_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='most_frequent')),
('onehot', OneHotEncoder(handle_unknown='ignore'))
])
# Preprocessing pipeline
preprocessor = ColumnTransformer(
transformers=[
('num', num_transformer, numerical_cols),
('cat', cat_transformer, categorical_cols)
])
# Fitting the data and transforming the training & test set
X_train_preprocessed = preprocessor.fit_transform(X_train)
test_preprocessed = preprocessor.fit_transform(test)
您可以使用 Pipeline
的 named_transformers_
of ColumnTransformer
. You have 2 transformers named 'num'
and 'cat'
, so preprocessor.named_transformers_['cat']
gives you access to your cat_transformer
. Then using named_steps
属性访问转换器 您可以访问名为 'onehot'
的 OneHotEncoder
及其 categories_
属性:
X = [['Male', 1], ['Female', 3], ['Female', 2]]
preprocessor.fit_transform(X)
Out[6]:
array([[-1.22474487, 0. , 1. ],
[ 1.22474487, 1. , 0. ],
[ 0. , 1. , 0. ]])
preprocessor.named_transformers_['cat'].named_steps['onehot'].categories_
Out[7]: [array(['Female', 'Male'], dtype=object)]