Python线性回归组合问题

Python Linear Regression Combination Problem

我需要计算数据框两个变量组的线性回归和 MSE。问题是我无法将具有两个变量的 xtrain 与具有一个变量的 ytrain 进行比较,但我的 ytrain 中只有一列。

代码:

from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01)

问题:

from itertools import combinations
for c in combinations(range(4), 2):
    lr=LinearRegression()
    lr.fit(Xtrain[:,c].reshape(-1,1),ytrain)
    yp=lr.predict(Xtest[:,c].reshape(-1,1))
    print('MSE', np.sum((ytest - yp)**2) / len(ytest))

错误:

不需要对特征矩阵使用reshape 方法,因为它们已经是二维的。如果您删除重塑,您的代码将起作用,请参见下文。

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from itertools import combinations
import numpy as np

X, y = make_regression(n_samples=100, n_features=4, n_informative=3, n_targets=1, noise=0.01, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

for c in combinations(range(4), 2):

    lr = LinearRegression()
    lr.fit(X_train[:, c], y_train)
    yp = lr.predict(X_test[:, c])

    print('MSE', np.sum((y_test - yp) ** 2) / len(y_test))

# MSE 591.707619290734
# MSE 33.613143724590564
# MSE 634.3248475857874
# MSE 1646.9447686107499
# MSE 2293.2878076807942
# MSE 1700.2559702871085