将 Dropping Column 实例添加到管道中
Adding Dropping Column instance into a Pipeline
一般来说,我们会df.drop('column_name', axis=1)
到删除DataFrame中的一列。
我想将此转换器添加到管道中
示例:
numerical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler(with_mean=False))
])
我该怎么做?
您可以将 Pipeline
封装到 ColumnTransformer
中,这样您就可以 select 通过管道处理的数据,如下所示:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_selector, make_column_transformer
col_to_exclude = 'A'
df = pd.DataFrame({'A' : [ 0]*10, 'B' : [ 1]*10, 'C' : [ 2]*10})
numerical_transformer = make_pipeline
SimpleImputer(strategy='mean'),
StandardScaler(with_mean=False)
)
transform = ColumnTransformer(
(numerical_transformer, make_column_selector(pattern=f'^(?!{col_to_exclude})'))
)
transform.fit_transform(df)
注意:我在这里使用正则表达式模式来排除列 A
.
您可以像这样编写自定义转换器:
class columnDropperTransformer():
def __init__(self,columns):
self.columns=columns
def transform(self,X,y=None):
return X.drop(self.columns,axis=1)
def fit(self, X, y=None):
return self
并在管道中使用它:
import pandas as pd
# sample dataframe
df = pd.DataFrame({
"col_1":["a","b","c","d"],
"col_2":["e","f","g","h"],
"col_3":[1,2,3,4],
"col_4":[5,6,7,8]
})
# your pipline
pipeline = Pipeline([
("columnDropper", columnDropperTransformer(['col_2','col_3']))
])
# apply the pipeline to dataframe
pipeline.fit_transform(df)
输出:
col_1 col_4
0 a 5
1 b 6
2 c 7
3 d 8
一般来说,我们会df.drop('column_name', axis=1)
到删除DataFrame中的一列。
我想将此转换器添加到管道中
示例:
numerical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),
('scaler', StandardScaler(with_mean=False))
])
我该怎么做?
您可以将 Pipeline
封装到 ColumnTransformer
中,这样您就可以 select 通过管道处理的数据,如下所示:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.compose import make_column_selector, make_column_transformer
col_to_exclude = 'A'
df = pd.DataFrame({'A' : [ 0]*10, 'B' : [ 1]*10, 'C' : [ 2]*10})
numerical_transformer = make_pipeline
SimpleImputer(strategy='mean'),
StandardScaler(with_mean=False)
)
transform = ColumnTransformer(
(numerical_transformer, make_column_selector(pattern=f'^(?!{col_to_exclude})'))
)
transform.fit_transform(df)
注意:我在这里使用正则表达式模式来排除列 A
.
您可以像这样编写自定义转换器:
class columnDropperTransformer():
def __init__(self,columns):
self.columns=columns
def transform(self,X,y=None):
return X.drop(self.columns,axis=1)
def fit(self, X, y=None):
return self
并在管道中使用它:
import pandas as pd
# sample dataframe
df = pd.DataFrame({
"col_1":["a","b","c","d"],
"col_2":["e","f","g","h"],
"col_3":[1,2,3,4],
"col_4":[5,6,7,8]
})
# your pipline
pipeline = Pipeline([
("columnDropper", columnDropperTransformer(['col_2','col_3']))
])
# apply the pipeline to dataframe
pipeline.fit_transform(df)
输出:
col_1 col_4
0 a 5
1 b 6
2 c 7
3 d 8