如何使用指数平滑来平滑 python 中的时间序列？

Question

我正在尝试使用指数平滑来平滑时间序列。

假设我的时间序列是这样的：

import pandas as pd

data = [446.6565,  454.4733,  455.663 ,  423.6322,  456.2713,  440.5881, 425.3325,  485.1494,  506.0482,  526.792 ,  514.2689,  494.211 ]
index= pd.date_range(start='1996', end='2008', freq='A')
oildata = pd.Series(data, index)

我想要得到那个时间序列的平滑版本。

如果我做了这样的事情；

from statsmodels.tsa.api import ExponentialSmoothing    
fit1 = SimpleExpSmoothing(oildata).fit(smoothing_level=0.2,optimized=False)
fcast1 = fit1.forecast(3).rename(r'$\alpha=0.2$')

它只输出预测的三个值，而不是我原始时间序列的平滑版本。有没有办法获得我的原始时间序列的平滑版本？

如果需要，我很乐意提供更多详细信息。

Answer 1

ExponentialSmoothing不是平滑时间序列数据的工具，它是一种时间序列预测方法.

The fit() function will return an instance of the HoltWintersResults class that contains the learned coefficients. The forecast() or the predict() function on the result object can be called to make a forecast.

因此，通过调用 predict，class 将使用学习的系数提供预测。

然而，为了平滑时间序列，您可以使用 fittedvalues 属性，正如@smarie 指出的那样

不过，我会选择更合适的工具，例如 savgol_filter:

from scipy.signal import savgol_filter
savgol_filter(oildata, 5, 3)

array([444.87816   , 461.58666   , 444.99296   , 441.70785143,
       442.40769143, 438.36852857, 441.50125714, 472.05622571,
       512.20891429, 521.74822857, 517.63141429, 493.37037143])

如评论中所述，savgol 过滤器对给定 window 大小（window_length）执行给定 polyorder 的局部泰勒近似，并导致平滑时间序列。

以上设置后的效果如下：

plt.plot(oildata)
plt.plot(pd.Series(savgol_filter(oildata, 5, 3), index=oildata.index))
plt.show()

Answer 2

显然，您可以在模型的 fittedvalues 属性中获得平滑值。

import pandas as pd

data = [446.6565,  454.4733,  455.663 ,  423.6322,  456.2713,  440.5881, 425.3325,  485.1494,  506.0482,  526.792 ,  514.2689,  494.211 ]
index= pd.date_range(start='1996', end='2008', freq='A')
oildata = pd.Series(data, index)

from statsmodels.tsa.api import SimpleExpSmoothing
fit1 = SimpleExpSmoothing(oildata).fit(smoothing_level=0.2,optimized=False)
# fcast1 = fit1.forecast(3).rename(r'$\alpha=0.2$')

import matplotlib.pyplot as plt
plt.plot(oildata)
plt.plot(fit1.fittedvalues)
plt.show()

它产生：

documentation 状态：

fittedvalues: ndarray

An array of the fitted values. Fitted by the Exponential Smoothing model.

请注意，您还可以使用包含所有值 + 第一个预测的 fittedfcast 属性，或仅包含预测的 fcastvalues 属性。

如何使用指数平滑来平滑 python 中的时间序列？

How to use exponential smoothing to smooth the timeseries in python?

python

signal-processing

machine-learning

time-series

forecasting