正确解释 statsmodels.tsa.ar_models.ar_select_order 函数数组以确定最佳滞后
Correctly interpreting statsmodels.tsa.ar_models.ar_select_order function array to determine optimal lag
使用 statsmodels 0.12.0 我试图确定 statsmodels.tsa.ar_models.AutoReg
模型的最佳滞后。我正在使用每月时间步长的美国人口数据,并将最大滞后 12 传递给 statsmodels.tsa.ar_models.ar_select_order
对象进行评估。
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
df = pd.read_csv('Data\uspopulation.csv', index_col='DATE', parse_dates=True)
df.index.freq = 'MS'
train_data = df.iloc[:84]
test_data = df.iloc[84:]
modelp = ar_select_order(train_data['PopEst'], maxlag=12)
上面的代码 returns 一个 [ 1 2 3 4 5 6 7 8 9 10 11 12]
的 numpy 数组,根据这个 Whosebug,我将其解释为“最佳滞后 p 是 12”问题:. However, evaluating on some metrics (RMSE) I find that my AutoReg fitted models with maxlag=12
are performing worse than lower order models. By trial and error I found that the optimal lag is 8. So I am having difficulty interpreting the resulting numpy array, I have been reading the resources on statsmodels.com/ar_select_order and statsmodels.com/autoregressions 但他们还没有说清楚。
这里有人有意见吗? python 库的新手,感觉有点迷茫。
The code above returns a numpy array of [ 1 2 3 4 5 6 7 8 9 10 11 12], which I am interpreting as "The optimal lag p is 12" as per this Whosebug question: Whosebug.
是的,没错。它 returns 一个数组而不只是 12
的原因是它还可以搜索不包含所有滞后的模型,如果您设置 glob=True
。例如,[ 1 2 3 12]
可能是具有某些年度季节性模式的月度模型的常见结果。
However, evaluating on some metrics (RMSE) I find that my AutoReg fitted models with maxlag=12 are performing worse than lower order models. By trial and error I found that the optimal lag is 8. So I am having difficulty interpreting the resulting numpy array, I have been reading the resources on statsmodels.com/ar_select_order and statsmodels.com/autoregressions but they have not made it clearer.
此函数正在返回使用 information criteria. In particular, the default is BIC or Bayesian information criterion 判断为最佳的模型。如果使用其他的标准,比如最小化out-of-sample RSME,那么肯定有可能发现不同的模型被判断为最优
使用 statsmodels 0.12.0 我试图确定 statsmodels.tsa.ar_models.AutoReg
模型的最佳滞后。我正在使用每月时间步长的美国人口数据,并将最大滞后 12 传递给 statsmodels.tsa.ar_models.ar_select_order
对象进行评估。
from statsmodels.tsa.ar_model import AutoReg, ar_select_order
df = pd.read_csv('Data\uspopulation.csv', index_col='DATE', parse_dates=True)
df.index.freq = 'MS'
train_data = df.iloc[:84]
test_data = df.iloc[84:]
modelp = ar_select_order(train_data['PopEst'], maxlag=12)
上面的代码 returns 一个 [ 1 2 3 4 5 6 7 8 9 10 11 12]
的 numpy 数组,根据这个 Whosebug,我将其解释为“最佳滞后 p 是 12”问题:maxlag=12
are performing worse than lower order models. By trial and error I found that the optimal lag is 8. So I am having difficulty interpreting the resulting numpy array, I have been reading the resources on statsmodels.com/ar_select_order and statsmodels.com/autoregressions 但他们还没有说清楚。
这里有人有意见吗? python 库的新手,感觉有点迷茫。
The code above returns a numpy array of [ 1 2 3 4 5 6 7 8 9 10 11 12], which I am interpreting as "The optimal lag p is 12" as per this Whosebug question: Whosebug.
是的,没错。它 returns 一个数组而不只是 12
的原因是它还可以搜索不包含所有滞后的模型,如果您设置 glob=True
。例如,[ 1 2 3 12]
可能是具有某些年度季节性模式的月度模型的常见结果。
However, evaluating on some metrics (RMSE) I find that my AutoReg fitted models with maxlag=12 are performing worse than lower order models. By trial and error I found that the optimal lag is 8. So I am having difficulty interpreting the resulting numpy array, I have been reading the resources on statsmodels.com/ar_select_order and statsmodels.com/autoregressions but they have not made it clearer.
此函数正在返回使用 information criteria. In particular, the default is BIC or Bayesian information criterion 判断为最佳的模型。如果使用其他的标准,比如最小化out-of-sample RSME,那么肯定有可能发现不同的模型被判断为最优