不同的系数:scikit-learn 与 statsmodels(逻辑回归)
Different coefficients: scikit-learn vs statsmodels (logistic regression)
当运行逻辑回归时,我使用statsmodels
are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn
得到的系数。我试过预处理数据无济于事。这是我的代码:
统计模型:
import statsmodels.api as sm
X_const = sm.add_constant(X)
model = sm.Logit(y, X_const)
results = model.fit()
print(results.summary())
相关输出为:
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.2382 3.983 -0.060 0.952 -8.045 7.569
a 2.0349 0.837 2.430 0.015 0.393 3.676
b 0.8077 0.823 0.981 0.327 -0.806 2.421
c 1.4572 0.768 1.897 0.058 -0.049 2.963
d -0.0522 0.063 -0.828 0.407 -0.176 0.071
e_2 0.9157 1.082 0.846 0.397 -1.205 3.037
e_3 2.0080 1.052 1.909 0.056 -0.054 4.070
Scikit-learn(无预处理)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
results = model.fit(X, y)
print(results.coef_)
print(results.intercept_)
给出的系数是:
array([[ 1.29779008, 0.56524976, 0.97268593, -0.03762884, 0.33646097,
0.98020901]])
给出的intercept/constant是:
array([ 0.0949539])
如您所见,无论哪个系数对应哪个变量,sklearn
给出的数字与statsmodels
给出的正确数字不匹配。我错过了什么?提前致谢!
我不熟悉 statsmodel
,但会不会是这个库的 .fit()
方法使用与 sklearn
不同的默认参数?为了验证这一点,您可以尝试为每个 .fit()
调用显式设置相同的对应参数,看看您是否仍然得到不同的结果。
感谢 kind soul on reddit, this was solved. To get the same coefficients, one has to negate the regularisation 默认情况下 sklearn
适用于逻辑回归:
model = LogisticRegression(C=1e8)
其中 C
根据 documentation 是:
C : float, default: 1.0
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.
当运行逻辑回归时,我使用statsmodels
are correct (verified them with some course material). However, I am unable to get the same coefficients with sklearn
得到的系数。我试过预处理数据无济于事。这是我的代码:
统计模型:
import statsmodels.api as sm
X_const = sm.add_constant(X)
model = sm.Logit(y, X_const)
results = model.fit()
print(results.summary())
相关输出为:
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -0.2382 3.983 -0.060 0.952 -8.045 7.569
a 2.0349 0.837 2.430 0.015 0.393 3.676
b 0.8077 0.823 0.981 0.327 -0.806 2.421
c 1.4572 0.768 1.897 0.058 -0.049 2.963
d -0.0522 0.063 -0.828 0.407 -0.176 0.071
e_2 0.9157 1.082 0.846 0.397 -1.205 3.037
e_3 2.0080 1.052 1.909 0.056 -0.054 4.070
Scikit-learn(无预处理)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
results = model.fit(X, y)
print(results.coef_)
print(results.intercept_)
给出的系数是:
array([[ 1.29779008, 0.56524976, 0.97268593, -0.03762884, 0.33646097,
0.98020901]])
给出的intercept/constant是:
array([ 0.0949539])
如您所见,无论哪个系数对应哪个变量,sklearn
给出的数字与statsmodels
给出的正确数字不匹配。我错过了什么?提前致谢!
我不熟悉 statsmodel
,但会不会是这个库的 .fit()
方法使用与 sklearn
不同的默认参数?为了验证这一点,您可以尝试为每个 .fit()
调用显式设置相同的对应参数,看看您是否仍然得到不同的结果。
感谢 kind soul on reddit, this was solved. To get the same coefficients, one has to negate the regularisation 默认情况下 sklearn
适用于逻辑回归:
model = LogisticRegression(C=1e8)
其中 C
根据 documentation 是:
C : float, default: 1.0
Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.