为什么除第一个（截距）以外的所有系数在 OLS 回归模型中都获得非常接近零（e^-17 或低）的值？

Question

我在 python 中使用 statsmodels 包编写了以下代码，以创建 OLS 回归模型。我用不同的数据集尝试了代码，得到的模型除了第一个（截距）系数外，所有系数值都接近于零。代码可能有什么问题？

data1 = pandas.concat([Y, X], axis = 1)
dta = lagmat2ds(data1, mxlg, trim='both', dropex=1)
dtaown = sm.add_constant(dta[:, 0:(mxlg + 1)], prepend = False)
dtajoint = sm.add_constant(dta[:, 0:], prepend = False)
res2down = sm.OLS(dta[:, 0], dtaown).fit()
res2djoint = sm.OLS(dta[:, 0], dtajoint).fit()

Here the sm is statsmodels.api as sm and for sample testing you can consider the dataset sm.datasets.spector.

Answer 1

数据的结构方式 - 您正在建模 Y 与 Y|lag Y|constant。请注意，OLS 文档 (https://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html) 指出 -

No constant is added by the model unless you are using formulas.

所以您看到的第一个值不是截距，而是拟合系数 Y 与 Y - 即 1.0。

您可以尝试检查您是否获得了合理的结果，将 Y 从这样的预测变量中排除 -

res2down = sm.OLS(dta[:, 0], dtaown[:, 1:]).fit()

为什么除第一个（截距）以外的所有系数在 OLS 回归模型中都获得非常接近零（e^-17 或低）的值？

Why all the coefficients except the first(intercept) are obtaining the value very close to zero(e^-17 or low) in the OLS regression model?

python

linear-regression

python-3.x

statsmodels