AttributeError: 'Pipeline' object has no attribute 'get_feature_names
AttributeError: 'Pipeline' object has no attribute 'get_feature_names
我有一个管道构建如下:
Pipeline(steps=[('preprocessor',
ColumnTransformer(remainder='passthrough',
transformers=[('text',
Pipeline(steps=[('CV',
CountVectorizer())]),
'Tweet'),
('category',
OneHotEncoder(handle_unknown='ignore'),
['Tweet_ID']),
('numeric',
Pipeline(steps=[('knnImputer',
KNNImputer(n_neighbors=2)),
('scaler',
MinMaxScale...
'CS',
'UC',
'CL',
'S',
'SS',
'UW',
...])])),
('classifier', LogisticRegression())])
我正在尝试获取功能名称:
feature_names = lr['preprocessor'].transformers_[0][1].get_feature_names()
coefs = lr.named_steps["classifier"].coef_.flatten()
zipped = zip(feature_names, coefs)
features_df = pd.DataFrame(zipped, columns=["feature", "value"])
features_df["ABS"] = features_df["value"].apply(lambda x: abs(x))
features_df["colors"] = features_df["value"].apply(lambda x: "green" if x > 0 else "red")
features_df = features_df.sort_values("ABS", ascending=False)
features_df
但是我得到一个错误:
----> 6 feature_names = lr['preprocessor'].transformers_[0][1].get_feature_names()
7 coefs = lr.named_steps["classifier"].coef_.flatten()
8
AttributeError: 'Pipeline' object has no attribute 'get_feature_names
我已经完成了以下回答:
- 'OneHotEncoder' object has no attribute 'get_feature_names'
但不幸的是,它们并没有我预期的那么有用。
有人知道怎么解决吗?
如果需要,很乐意提供更多信息。
管道示例如下:
lr = Pipeline(steps=[('preprocessor', preprocessing),
('classifier', LogisticRegression(C=5, tol=0.01, solver='lbfgs', max_iter=10000))])
预处理是
preprocessing = ColumnTransformer(
transformers=[
('text',text_preprocessing, 'Tweet'),
('category', categorical_preprocessing, c_feat),
('numeric', numeric_preprocessing, n_feat)
], remainder='passthrough')
我在拆分火车和测试集之前将不同类型的特征分开:
text_columns=['Tweet']
target=['Label']
c_feat=['Tweet_ID']
num_features=['CS','UC','CL','S','SS','UW']
根据 David 的回答和 link,我尝试了如下操作:
对于数值:
class NumericalTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
# Numerical features to pass down the numerical pipeline
X = X[[num_features]]
X = X.replace([np.inf, -np.inf], np.nan)
return X.values
# Defining the steps in the numerical pipeline
numerical_pipeline = Pipeline(steps=[
('num_transformer', NumericalTransformer()),
('imputer', KNNImputer(n_neighbors=2)),
('minmax', MinMaxScaler())])
对于分类:
class CategoricalTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
# Return self nothing else to do here
def fit(self, X, y=None):
return self
# Helper function that converts values to Binary depending on input
def create_binary(self, obj):
if obj == 0:
return 'No'
else:
return 'Yes'
# Transformer method for this transformer
def transform(self, X, y=None):
# Categorical features to pass down the categorical pipeline
return X[[c_feat]].values
# Defining the steps in the categorical pipeline
categorical_pipeline = Pipeline(steps=[
('cat_transformer', CategoricalTransformer()),
('one_hot_encoder', OneHotEncoder(handle_unknown='ignore'))])
对于文本功能:
class TextTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
# Return self nothing else to do here
def fit(self, X, y=None):
return self
# Helper function that converts values to Binary depending on input
def create_binary(self, obj):
if obj == 0:
return 'No'
else:
return 'Yes'
# Transformer method for this transformer
def transform(self, X, y=None):
# Text features to pass down the text pipeline
return X[['Tweet']].values
# Defining the steps in the text pipeline
text_pipeline = Pipeline(steps=[
('text_transformer', TextTransformer()),
('cv', CountVectorizer())])
然后我将数值、文本和分类管道横向组合成一个完整的大管道:
# using FeatureUnion
union_pipeline = FeatureUnion(transformer_list=[
('categorical_pipeline', categorical_pipeline),
('numerical_pipeline', numerical_pipeline),
('text_pipeline', text_pipeline)])
最后:
# Combining the custom imputer with the categorical, text and numerical pipeline
preprocess_pipeline = Pipeline(steps=[('custom_imputer', CustomImputer()),
('full_pipeline', union_pipeline)])
目前还不清楚的是如何获取特征名称。
您需要实现专用的 get_feature_names
函数,因为您正在使用自定义转换器。
详情请参考this question,您可以在其中找到代码示例。
我有一个管道构建如下:
Pipeline(steps=[('preprocessor',
ColumnTransformer(remainder='passthrough',
transformers=[('text',
Pipeline(steps=[('CV',
CountVectorizer())]),
'Tweet'),
('category',
OneHotEncoder(handle_unknown='ignore'),
['Tweet_ID']),
('numeric',
Pipeline(steps=[('knnImputer',
KNNImputer(n_neighbors=2)),
('scaler',
MinMaxScale...
'CS',
'UC',
'CL',
'S',
'SS',
'UW',
...])])),
('classifier', LogisticRegression())])
我正在尝试获取功能名称:
feature_names = lr['preprocessor'].transformers_[0][1].get_feature_names()
coefs = lr.named_steps["classifier"].coef_.flatten()
zipped = zip(feature_names, coefs)
features_df = pd.DataFrame(zipped, columns=["feature", "value"])
features_df["ABS"] = features_df["value"].apply(lambda x: abs(x))
features_df["colors"] = features_df["value"].apply(lambda x: "green" if x > 0 else "red")
features_df = features_df.sort_values("ABS", ascending=False)
features_df
但是我得到一个错误:
----> 6 feature_names = lr['preprocessor'].transformers_[0][1].get_feature_names()
7 coefs = lr.named_steps["classifier"].coef_.flatten()
8
AttributeError: 'Pipeline' object has no attribute 'get_feature_names
我已经完成了以下回答:
- 'OneHotEncoder' object has no attribute 'get_feature_names'
但不幸的是,它们并没有我预期的那么有用。
有人知道怎么解决吗? 如果需要,很乐意提供更多信息。
管道示例如下:
lr = Pipeline(steps=[('preprocessor', preprocessing),
('classifier', LogisticRegression(C=5, tol=0.01, solver='lbfgs', max_iter=10000))])
预处理是
preprocessing = ColumnTransformer(
transformers=[
('text',text_preprocessing, 'Tweet'),
('category', categorical_preprocessing, c_feat),
('numeric', numeric_preprocessing, n_feat)
], remainder='passthrough')
我在拆分火车和测试集之前将不同类型的特征分开:
text_columns=['Tweet']
target=['Label']
c_feat=['Tweet_ID']
num_features=['CS','UC','CL','S','SS','UW']
根据 David 的回答和 link,我尝试了如下操作:
对于数值:
class NumericalTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
# Numerical features to pass down the numerical pipeline
X = X[[num_features]]
X = X.replace([np.inf, -np.inf], np.nan)
return X.values
# Defining the steps in the numerical pipeline
numerical_pipeline = Pipeline(steps=[
('num_transformer', NumericalTransformer()),
('imputer', KNNImputer(n_neighbors=2)),
('minmax', MinMaxScaler())])
对于分类:
class CategoricalTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
# Return self nothing else to do here
def fit(self, X, y=None):
return self
# Helper function that converts values to Binary depending on input
def create_binary(self, obj):
if obj == 0:
return 'No'
else:
return 'Yes'
# Transformer method for this transformer
def transform(self, X, y=None):
# Categorical features to pass down the categorical pipeline
return X[[c_feat]].values
# Defining the steps in the categorical pipeline
categorical_pipeline = Pipeline(steps=[
('cat_transformer', CategoricalTransformer()),
('one_hot_encoder', OneHotEncoder(handle_unknown='ignore'))])
对于文本功能:
class TextTransformer(BaseEstimator, TransformerMixin):
def __init__(self):
super().__init__()
# Return self nothing else to do here
def fit(self, X, y=None):
return self
# Helper function that converts values to Binary depending on input
def create_binary(self, obj):
if obj == 0:
return 'No'
else:
return 'Yes'
# Transformer method for this transformer
def transform(self, X, y=None):
# Text features to pass down the text pipeline
return X[['Tweet']].values
# Defining the steps in the text pipeline
text_pipeline = Pipeline(steps=[
('text_transformer', TextTransformer()),
('cv', CountVectorizer())])
然后我将数值、文本和分类管道横向组合成一个完整的大管道:
# using FeatureUnion
union_pipeline = FeatureUnion(transformer_list=[
('categorical_pipeline', categorical_pipeline),
('numerical_pipeline', numerical_pipeline),
('text_pipeline', text_pipeline)])
最后:
# Combining the custom imputer with the categorical, text and numerical pipeline
preprocess_pipeline = Pipeline(steps=[('custom_imputer', CustomImputer()),
('full_pipeline', union_pipeline)])
目前还不清楚的是如何获取特征名称。
您需要实现专用的 get_feature_names
函数,因为您正在使用自定义转换器。
详情请参考this question,您可以在其中找到代码示例。