如何检查每个参数的多元线性回归结果(sklearn 模型)
How to check multiple linear regression result per parameter (sklearn model)
我在 sklearn 上使用默认的多元线性回归
from sklearn import linear_model
regr = linear_model.LinearRegression()
model = regr.fit(X, y)
predictions = model.predict(X)
当我调用 prediction
时,结果如下
ApplicationID
2019XXX68954 0.700000
2020XXX59500 0.642747
2020XXX52277 0.405954
我想要的
ApplicationID Variable1 Variable2 Score
2019XXX68954 0.200000 0.500000 0.700000
2020XXX59500 ........ ........ 0.642747
2020XXX52277 ........ ........ 0.405954
我说的Variable1
和Variable2
是在这个多元回归中由系数时间常数产生的部分分数,所以我可以看到哪个变量对变量的贡献最大
IIUC,您可以按元素将 model.coef_
与您的 X
相乘:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
X, y = make_regression(n_samples=10, n_features=3, bias=0.9, random_state=51)
model = LinearRegression()
model.fit(X, y)
# Form the dataframe
data = X * model.coef_
columns=[f"Variable{j}" for j, _ in enumerate(model.coef_, start=1)]
result = pd.DataFrame(data, columns=columns)
# put the intercept, too
result.insert(0, "Variable0", model.intercept_)
获得
>>> result
Variable0 Variable1 Variable2 Variable3
0 0.9 -17.538372 4.172825 108.040511
1 0.9 156.267901 -18.817702 -50.471148
2 0.9 -21.506439 -40.510528 -30.320019
3 0.9 110.403966 40.281776 31.840830
4 0.9 -41.648604 -3.187173 71.067339
5 0.9 -76.860056 27.791395 -48.228522
6 0.9 -82.160185 3.718984 -4.145350
7 0.9 17.780070 -49.726577 -90.128025
8 0.9 55.302550 63.892190 44.852370
9 0.9 -6.689355 -44.186517 -87.087998
完整性检查是 result
每行的总和应等于每个样本的模型预测:
>>> np.allclose(model.predict(X), result.sum(axis=1))
True
我在 sklearn 上使用默认的多元线性回归
from sklearn import linear_model
regr = linear_model.LinearRegression()
model = regr.fit(X, y)
predictions = model.predict(X)
当我调用 prediction
时,结果如下
ApplicationID
2019XXX68954 0.700000
2020XXX59500 0.642747
2020XXX52277 0.405954
我想要的
ApplicationID Variable1 Variable2 Score
2019XXX68954 0.200000 0.500000 0.700000
2020XXX59500 ........ ........ 0.642747
2020XXX52277 ........ ........ 0.405954
我说的Variable1
和Variable2
是在这个多元回归中由系数时间常数产生的部分分数,所以我可以看到哪个变量对变量的贡献最大
IIUC,您可以按元素将 model.coef_
与您的 X
相乘:
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
X, y = make_regression(n_samples=10, n_features=3, bias=0.9, random_state=51)
model = LinearRegression()
model.fit(X, y)
# Form the dataframe
data = X * model.coef_
columns=[f"Variable{j}" for j, _ in enumerate(model.coef_, start=1)]
result = pd.DataFrame(data, columns=columns)
# put the intercept, too
result.insert(0, "Variable0", model.intercept_)
获得
>>> result
Variable0 Variable1 Variable2 Variable3
0 0.9 -17.538372 4.172825 108.040511
1 0.9 156.267901 -18.817702 -50.471148
2 0.9 -21.506439 -40.510528 -30.320019
3 0.9 110.403966 40.281776 31.840830
4 0.9 -41.648604 -3.187173 71.067339
5 0.9 -76.860056 27.791395 -48.228522
6 0.9 -82.160185 3.718984 -4.145350
7 0.9 17.780070 -49.726577 -90.128025
8 0.9 55.302550 63.892190 44.852370
9 0.9 -6.689355 -44.186517 -87.087998
完整性检查是 result
每行的总和应等于每个样本的模型预测:
>>> np.allclose(model.predict(X), result.sum(axis=1))
True