在 Python 中生成统计表并导出到 Excel
Generate statistical tables in Python and export to Excel
我想在 Python 中生成用于出版物的高质量统计表。
在 Stata 中,可以使用 社区贡献的 系列命令 estout
:
sysuse auto, clear
regress mpg weight
estimates store A
regress mpg weight price
estimates store B
regress mpg weight price length
estimates store C
regress mpg weight price length displacement
estimates store D
esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")
----------------------------------------------------------------------------
Model 1 Model 2 Model 3 Model 4
----------------------------------------------------------------------------
weight -0.00601*** -0.00582*** -0.00304 -0.00354
(0.000518) (0.000618) (0.00177) (0.00212)
price -0.0000935 -0.000173 -0.000174
(0.000163) (0.000168) (0.000169)
length -0.0966 -0.0947
(0.0577) (0.0582)
displacement 0.00433
(0.00983)
_cons 39.44*** 39.44*** 49.68*** 50.02***
(1.614) (1.622) (6.329) (6.410)
----------------------------------------------------------------------------
N 74 74 74 74
R-sq 0.652 0.653 0.666 0.667
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
我怎样才能 运行 在 Python 中进行多元回归并将信息汇总到一些漂亮的表格中?
我也想将这些导出到 Excel。
您可以使用 statsmodels
中的 summary_col()
函数:
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
df['cons'] = 1
Y = df['mpg']
X1 = df[['weight', 'cons']]
X2 = df[['weight', 'price', 'cons']]
X3 = df[['weight', 'price', 'length', 'cons']]
X4 = df[['weight', 'price', 'length', 'displacement', 'cons']]
reg1 = sm.OLS(Y, X1).fit()
reg2 = sm.OLS(Y, X2).fit()
reg3 = sm.OLS(Y, X3).fit()
reg4 = sm.OLS(Y, X4).fit()
results = summary_col([reg1, reg2, reg3, reg4],stars=True,float_format='%0.2f',
model_names=['Model\n(1)', 'Model\n(2)', 'Model\n(3)', 'Model\n(4)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)})
以上代码片段将产生以下内容:
print(results)
================================================
Model Model Model Model
(1) (2) (3) (4)
------------------------------------------------
cons 39.44*** 39.44*** 49.68*** 50.02***
(1.61) (1.62) (6.33) (6.41)
displacement 0.00
(0.01)
length -0.10* -0.09
(0.06) (0.06)
price -0.00 -0.00 -0.00
(0.00) (0.00) (0.00)
weight -0.01*** -0.01*** -0.00* -0.00*
(0.00) (0.00) (0.00) (0.00)
N 74 74 74 74
R2 0.65 0.65 0.67 0.67
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
然后你只需导出:
results_text = results.as_text()
import csv
resultFile = open("table.csv",'w')
resultFile.write(results_text)
resultFile.close()
我想在 Python 中生成用于出版物的高质量统计表。
在 Stata 中,可以使用 社区贡献的 系列命令 estout
:
sysuse auto, clear
regress mpg weight
estimates store A
regress mpg weight price
estimates store B
regress mpg weight price length
estimates store C
regress mpg weight price length displacement
estimates store D
esttab A B C D, se r2 nonumber mtitle("Model 1" "Model 2" "Model 3" "Model 4")
----------------------------------------------------------------------------
Model 1 Model 2 Model 3 Model 4
----------------------------------------------------------------------------
weight -0.00601*** -0.00582*** -0.00304 -0.00354
(0.000518) (0.000618) (0.00177) (0.00212)
price -0.0000935 -0.000173 -0.000174
(0.000163) (0.000168) (0.000169)
length -0.0966 -0.0947
(0.0577) (0.0582)
displacement 0.00433
(0.00983)
_cons 39.44*** 39.44*** 49.68*** 50.02***
(1.614) (1.622) (6.329) (6.410)
----------------------------------------------------------------------------
N 74 74 74 74
R-sq 0.652 0.653 0.666 0.667
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
我怎样才能 运行 在 Python 中进行多元回归并将信息汇总到一些漂亮的表格中?
我也想将这些导出到 Excel。
您可以使用 statsmodels
中的 summary_col()
函数:
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
df = pd.read_stata('http://www.stata-press.com/data/r14/auto.dta')
df['cons'] = 1
Y = df['mpg']
X1 = df[['weight', 'cons']]
X2 = df[['weight', 'price', 'cons']]
X3 = df[['weight', 'price', 'length', 'cons']]
X4 = df[['weight', 'price', 'length', 'displacement', 'cons']]
reg1 = sm.OLS(Y, X1).fit()
reg2 = sm.OLS(Y, X2).fit()
reg3 = sm.OLS(Y, X3).fit()
reg4 = sm.OLS(Y, X4).fit()
results = summary_col([reg1, reg2, reg3, reg4],stars=True,float_format='%0.2f',
model_names=['Model\n(1)', 'Model\n(2)', 'Model\n(3)', 'Model\n(4)'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.2f}".format(x.rsquared)})
以上代码片段将产生以下内容:
print(results)
================================================
Model Model Model Model
(1) (2) (3) (4)
------------------------------------------------
cons 39.44*** 39.44*** 49.68*** 50.02***
(1.61) (1.62) (6.33) (6.41)
displacement 0.00
(0.01)
length -0.10* -0.09
(0.06) (0.06)
price -0.00 -0.00 -0.00
(0.00) (0.00) (0.00)
weight -0.01*** -0.01*** -0.00* -0.00*
(0.00) (0.00) (0.00) (0.00)
N 74 74 74 74
R2 0.65 0.65 0.67 0.67
================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
然后你只需导出:
results_text = results.as_text()
import csv
resultFile = open("table.csv",'w')
resultFile.write(results_text)
resultFile.close()