哪个公式用于statsmodels OLS中的t值和标准误差
Which formula is used for t-value and standard error in statsmodels OLS
我想了解 python statsmodels 库的工作原理。因此,当我尝试使用计量经济学的公式获取 OLS t 值和 SEE 或 bse 的结果时,我得到的答案与 statsmodels 中的答案不同。 (零截距的 OLS)
我有:
x = [1,2,3]
y = [7,3,5]
并收到与 R^2 相同的结果,残差与使用此代码在 statsmodels 中的结果相同:
def ols(x, y):
# OLS
df = pd.DataFrame(data={'x':x, 'y':y})
coeff = sum(df['y'] * df['x']) / sum(df['x'] ** 2)
df['predict'] = df['x'] * coeff
# R^2
n = len(df)
rss = sum((df['y'] - df['predict']) ** 2 )
tss = sum((df['y']) ** 2)
r2 = 1 - rss/tss
# Residaual
resid = (df['y'] - df['predict']).values
return coeff, r2, resid, df
这是我的统计模型对象
ols_obj = OLS(y, x).fit()
print('coeff', (ols_obj.predict(x)/x)[0])
print('R^2', ols_obj.rsquared)
print('resid', ols_obj.resid)
print('t', ols_obj.tvalues)
print('param', ols_obj.params[0], '| bse', ols_obj.bse[0], '| param/bse', ols_obj.params[0]/ols_obj.bse)
coeff 2.0
R^2 0.6746987951807228
resid [ 5. -1. -1.]
t [2.03670031]
param 2.0 | bse 0.9819805060619657 | param/bse [2.03670031]
这是我的功能:
coeff, r2, resid, df = ols(x, y)
print('coeff', coeff)
print('R^2', r2)
print('resid', resid)
coeff 2.0
R^2 0.6746987951807228
resid [ 5. -1. -1.]
但是对于 t 值,我得到了错误的数字
从计量经济学我使用标准误差公式
SE(b) = sqrt( ( sum(resid^2) / (n-2) ) / sum( (x - mean(x) ) **2 ) )
SE(b) = 3.6742346141747673
我做错了什么?
我相信,就像@Josef 一样,这是带有截距的公式。如果你允许自己跟随矩阵发展 Wiki link:
import statsmodels.api as sm
x = np.array([1,2,3])
y = np.array([7,3,5])
resx = sm.OLS(y, x).fit()
# residual variance
res_variance = (1/(3-1))*sum(resx.resid**2)
# estimator stand. Err.
beta_se = np.sqrt((res_variance)*(1 / (x.T @ x))) # x.T @ x is a scalar here. use np.linalg.Inv otherwise
new_tval = resx.params / beta_se # 2.036700..
与
相同
resx.tvalues # 2.036700..
当没有截距时,我找到了标准误差或 SE(b) 的公式。
对于(nD x轴)矩阵,它看起来像这样:
SSE = np.dot(residual.T, residual)
DFE = len(x) - 2
MSE = SSE/DFE
inverted = np.linalg.inv(x.T, x)
covariance = inverted * MSE
bse = np.diag(covariance)
tvalues = coeff/bse
对于一维数组 x,它看起来像这样:
bse = math.sqrt(sum(resid**2) / ((len(df)-1) * sum( (df.x ** 2 ))))
tvalue = coeff/bse
我想了解 python statsmodels 库的工作原理。因此,当我尝试使用计量经济学的公式获取 OLS t 值和 SEE 或 bse 的结果时,我得到的答案与 statsmodels 中的答案不同。 (零截距的 OLS) 我有:
x = [1,2,3]
y = [7,3,5]
并收到与 R^2 相同的结果,残差与使用此代码在 statsmodels 中的结果相同:
def ols(x, y):
# OLS
df = pd.DataFrame(data={'x':x, 'y':y})
coeff = sum(df['y'] * df['x']) / sum(df['x'] ** 2)
df['predict'] = df['x'] * coeff
# R^2
n = len(df)
rss = sum((df['y'] - df['predict']) ** 2 )
tss = sum((df['y']) ** 2)
r2 = 1 - rss/tss
# Residaual
resid = (df['y'] - df['predict']).values
return coeff, r2, resid, df
这是我的统计模型对象
ols_obj = OLS(y, x).fit()
print('coeff', (ols_obj.predict(x)/x)[0])
print('R^2', ols_obj.rsquared)
print('resid', ols_obj.resid)
print('t', ols_obj.tvalues)
print('param', ols_obj.params[0], '| bse', ols_obj.bse[0], '| param/bse', ols_obj.params[0]/ols_obj.bse)
coeff 2.0
R^2 0.6746987951807228
resid [ 5. -1. -1.]
t [2.03670031]
param 2.0 | bse 0.9819805060619657 | param/bse [2.03670031]
这是我的功能:
coeff, r2, resid, df = ols(x, y)
print('coeff', coeff)
print('R^2', r2)
print('resid', resid)
coeff 2.0
R^2 0.6746987951807228
resid [ 5. -1. -1.]
但是对于 t 值,我得到了错误的数字
从计量经济学我使用标准误差公式
SE(b) = sqrt( ( sum(resid^2) / (n-2) ) / sum( (x - mean(x) ) **2 ) )
SE(b) = 3.6742346141747673
我做错了什么?
我相信,就像@Josef 一样,这是带有截距的公式。如果你允许自己跟随矩阵发展 Wiki link:
import statsmodels.api as sm
x = np.array([1,2,3])
y = np.array([7,3,5])
resx = sm.OLS(y, x).fit()
# residual variance
res_variance = (1/(3-1))*sum(resx.resid**2)
# estimator stand. Err.
beta_se = np.sqrt((res_variance)*(1 / (x.T @ x))) # x.T @ x is a scalar here. use np.linalg.Inv otherwise
new_tval = resx.params / beta_se # 2.036700..
与
相同resx.tvalues # 2.036700..
当没有截距时,我找到了标准误差或 SE(b) 的公式。
对于(nD x轴)矩阵,它看起来像这样:
SSE = np.dot(residual.T, residual)
DFE = len(x) - 2
MSE = SSE/DFE
inverted = np.linalg.inv(x.T, x)
covariance = inverted * MSE
bse = np.diag(covariance)
tvalues = coeff/bse
对于一维数组 x,它看起来像这样:
bse = math.sqrt(sum(resid**2) / ((len(df)-1) * sum( (df.x ** 2 ))))
tvalue = coeff/bse