管道中的 Sklearn 组件未安装,即使整个管道是?
Sklearn components in pipeline is not fitted even if the whole pipeline is?
我正试图从安装的管道中挑出一个 component/transformer 来检查它的行为。但是,当我检索该组件时,该组件显示为未安装,但将管道作为一个整体使用没有问题。这表明管道已安装,组件也已安装。
有人可以解释原因,并建议如何检查安装管道中的组件吗?
这是一个可重现的例子:
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
np.random.seed(0)
# Read data from Titanic dataset.
titanic_url = ('https://raw.githubusercontent.com/amueller/'
'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))])
X = data.drop('survived', axis=1)
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))
呼叫:
clf.get_params()['preprocessor__cat__imputer'].transform(X)
或
clf.named_steps['preprocessor'].transformers[0][1].named_steps['imputer'].transform(X)
会导致这样的错误:
NotFittedError: This SimpleImputer instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
ColumnTransformer
属性transformers
是输入unfitted变压器。要访问安装的变压器,请使用属性 transformers_
或 named_transformers_
。我想 get_params()['preprocessor__cat__imputer']
也得到了不合适的输入变压器。
(你仍然会得到一个错误,因为输入器也会尝试处理字符串数据,strategy='median'
会失败。)
我正试图从安装的管道中挑出一个 component/transformer 来检查它的行为。但是,当我检索该组件时,该组件显示为未安装,但将管道作为一个整体使用没有问题。这表明管道已安装,组件也已安装。
有人可以解释原因,并建议如何检查安装管道中的组件吗?
这是一个可重现的例子:
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
np.random.seed(0)
# Read data from Titanic dataset.
titanic_url = ('https://raw.githubusercontent.com/amueller/'
'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='median')),
('scaler', StandardScaler())])
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
transformers=[
('num', numeric_transformer, numeric_features),
('cat', categorical_transformer, categorical_features)])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', LogisticRegression(solver='lbfgs'))])
X = data.drop('survived', axis=1)
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))
呼叫:
clf.get_params()['preprocessor__cat__imputer'].transform(X)
或
clf.named_steps['preprocessor'].transformers[0][1].named_steps['imputer'].transform(X)
会导致这样的错误:
NotFittedError: This SimpleImputer instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
ColumnTransformer
属性transformers
是输入unfitted变压器。要访问安装的变压器,请使用属性 transformers_
或 named_transformers_
。我想 get_params()['preprocessor__cat__imputer']
也得到了不合适的输入变压器。
(你仍然会得到一个错误,因为输入器也会尝试处理字符串数据,strategy='median'
会失败。)