在管道中的特定列上使用 StandardScaler 并连接到原始数据

Question

我有一个包含 4 个数字列的数据框，我试图在 Pipeline 中使用 StandardScaler 仅缩放一列。我使用下面的代码来缩放和转换我的专栏。

num_feat = ['Quantity']
num_trans = Pipeline([('scale', StandardScaler())])

preprocessor = ColumnTransformer(transformers = ['num', num_trans, num_feat])

pipe = Pipeline([('preproc', preprocessor),
                ('rf', RandomForestRegressor(random_state = 0))
                ])

完成此操作后，我将拆分我的数据并按如下方式训练我的模型。

y = df1['target']
x = df1.drop(['target','ID'], axis = 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
pipe.fit(x_train, y_train)

这给我错误 ValueError: not enough values to unpack (expected 3, got 1)。我知道这可能是因为我的数据框中有其他 3 个数字列。那么我如何将缩放数据连接到我剩余的数据帧并在整个数据上训练我的模型。或者有什么更好的办法吗

Answer 1

初始化transformer时请加括号

preprocessor = ColumnTransformer(transformers = [('num', num_trans, num_feat)],remainder='passthrough')

在管道中的特定列上使用 StandardScaler 并连接到原始数据

Using StandardScaler on specific column in Pipeline and concatenate to original data

python

pipeline

scikit-learn