使用 scikit-learn 管道与手动执行时得分不同
different scores when using scikit-learn pipeline vs. doing it manually
下面使用 minmaxscaler、polyl 特征和线性回归分类器的简单示例。
通过管道进行:
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
0.4433729905419167
3.4067909278765605
[ 0. -7.60868833 5.87162697]
手动执行:
X_trainScaled = MinMaxScaler().fit_transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = MinMaxScaler().fit_transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
0.44099256691782807
3.4067909278765605
[ 0. -7.60868833 5.87162697]
True
[ True True True]
问题出在您的手动步骤上,您在其中使用测试数据对 Scaler 进行了改装,您需要将其拟合到训练数据上并在测试数据上使用拟合实例,详情请参见此处: and StandardScaler before and after splitting data
from sklearn.datasets import make_classification, make_regression
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
X, y = make_regression(n_features=3, n_samples=50, n_informative=1, noise=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, y)
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
scaler = MinMaxScaler().fit(X_train)
X_trainScaled = scaler.transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = scaler.transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
下面使用 minmaxscaler、polyl 特征和线性回归分类器的简单示例。
通过管道进行:
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
0.4433729905419167
3.4067909278765605
[ 0. -7.60868833 5.87162697]
手动执行:
X_trainScaled = MinMaxScaler().fit_transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = MinMaxScaler().fit_transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)
0.44099256691782807
3.4067909278765605
[ 0. -7.60868833 5.87162697]
True
[ True True True]
问题出在您的手动步骤上,您在其中使用测试数据对 Scaler 进行了改装,您需要将其拟合到训练数据上并在测试数据上使用拟合实例,详情请参见此处:
from sklearn.datasets import make_classification, make_regression
from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
X, y = make_regression(n_features=3, n_samples=50, n_informative=1, noise=1)
X_train, X_test, Y_train, Y_test = train_test_split(X, y)
pipeLine = make_pipeline(MinMaxScaler(),PolynomialFeatures(), LinearRegression())
pipeLine.fit(X_train,Y_train)
print(pipeLine.score(X_test,Y_test))
print(pipeLine.steps[2][1].intercept_)
print(pipeLine.steps[2][1].coef_)
scaler = MinMaxScaler().fit(X_train)
X_trainScaled = scaler.transform(X_train)
X_trainScaledandPoly = PolynomialFeatures().fit_transform(X_trainScaled)
X_testScaled = scaler.transform(X_test)
X_testScaledandPoly = PolynomialFeatures().fit_transform(X_testScaled)
reg = LinearRegression()
reg.fit(X_trainScaledandPoly,Y_train)
print(reg.score(X_testScaledandPoly,Y_test))
print(reg.intercept_)
print(reg.coef_)
print(reg.intercept_ == pipeLine.steps[2][1].intercept_)
print(reg.coef_ == pipeLine.steps[2][1].coef_)