回归分析，使用statsmodels

Question

请帮助我获取此代码的输出code.why 这段代码的输出是 nan？！！！我错了什么？

import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import matplotlib.pyplot as plt
import math
import datetime as dt
#importing Data
es_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/hbrbcpe.txt'
vs_url = 'https://www.stoxx.com/document/Indices/Current/HistoricalData/h_vstoxx.txt'
#creating DataFrame
cols=['SX5P','SX5E','SXXP','SXXE','SXXF','SXXA','DK5f','DKXF']
es=pd.read_csv(es_url,index_col=0,parse_dates=True,sep=';',dayfirst=True,header=None,skiprows=4,names=cols)
vs=pd.read_csv(vs_url,index_col=0,header=2,parse_dates=True,sep=',',dayfirst=True)
data=pd.DataFrame({'EUROSTOXX' : es['SX5E'][es.index > dt.datetime(1999,1,1)]},dtype=float)
data=data.join(pd.DataFrame({'VSTOXX' : vs['V2TX'][vs.index > dt.datetime(1999,1,1)]},dtype=float))
data=data.fillna(method='ffill')
rets=(((data/data.shift(1))-1)*100).round(2)
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
#regression analysis
model = smf.ols('ydat ~ xdat',data=rets).fit()
print model.summary()

Answer 1

问题是，当您计算 rets 时，除以零会导致 inf。此外，当你使用 shift 时，你有 NaNs，所以你有缺失值，需要在进行回归之前先以某种方式处理。

使用您的数据浏览此示例并查看：

df = data.loc['2016-03-20':'2016-04-01'].copy()

df 看起来像：

            EUROSTOXX   VSTOXX
2016-03-21    3048.77  35.6846
2016-03-22    3051.23  35.6846
2016-03-23    3042.42  35.6846
2016-03-24    2986.73  35.6846
2016-03-25       0.00  35.6846
2016-03-28       0.00  35.6846
2016-03-29    3004.87  35.6846
2016-03-30    3044.10  35.6846
2016-03-31    3004.93  35.6846
2016-04-01    2953.28  35.6846

移1除：

df = (((df/df.shift(1))-1)*100).round(2)

打印出来：

             EUROSTOXX  VSTOXX
2016-03-21         NaN     NaN
2016-03-22    0.080688     0.0
2016-03-23   -0.288736     0.0
2016-03-24   -1.830451     0.0
2016-03-25 -100.000000     0.0
2016-03-28         NaN     0.0
2016-03-29         inf     0.0
2016-03-30    1.305547     0.0
2016-03-31   -1.286751     0.0
2016-04-01   -1.718842     0.0

要点：自动移动 1 总是在顶部创建一个 NaN。 0.00 除以 0.00 产生 inf.

处理缺失值的一种可能解决方案：

...
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']

# handle missing values
messed_up_indices = xdat[xdat.isin([-np.inf, np.inf, np.nan]) == True].index
xdat[messed_up_indices] = xdat[messed_up_indices].replace([-np.inf, np.inf], np.nan)
xdat[messed_up_indices] = xdat[messed_up_indices].fillna(xdat.mean())
ydat[messed_up_indices] = ydat[messed_up_indices].fillna(0.0)

#regression analysis
model = smf.ols('ydat ~ xdat',data=rets, missing='raise').fit()
print(model.summary())

注意我将 missing='raise' 参数添加到 ols 以查看发生了什么。

最终结果打印出来：

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   ydat   R-squared:                       0.259
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     1593.
Date:                Wed, 03 Jan 2018   Prob (F-statistic):          5.76e-299
Time:                        12:01:14   Log-Likelihood:                -13856.
No. Observations:                4554   AIC:                         2.772e+04
Df Residuals:                    4552   BIC:                         2.773e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.1608      0.075      2.139      0.033       0.013       0.308
xdat          -1.4209      0.036    -39.912      0.000      -1.491      -1.351
==============================================================================
Omnibus:                     4280.114   Durbin-Watson:                   2.074
Prob(Omnibus):                  0.000   Jarque-Bera (JB):          4021394.925
Skew:                          -3.446   Prob(JB):                         0.00
Kurtosis:                     148.415   Cond. No.                         2.11
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

回归分析，使用statsmodels

Regression analysis,using statsmodels

finance

regression

python-2.7

statsmodels