ExponentialSmoothing - 用于此日期图的预测方法是什么？

Question

我目前有这些日期与累计总和的数据点。我想使用 python 预测未来日期的累计总和。我应该使用什么预测方法？

我的日期系列采用这种格式：['2020-01-20', '2020-01-24', '2020-01-26', '2020-01-27', '2020-01-30', '2020-01-31'] dtype='datetime64[ns]'

我尝试了 spline，但似乎 spline 无法处理日期时间序列

我尝试使用指数平滑法进行时间序列预测，但结果不正确。我不理解 predict(3) 的含义以及为什么它 returns 我已经拥有的日期的预测总和。我从一个例子中复制了这段代码。这是我的 exp 平滑代码：

fit1 = ExponentialSmoothing(date_cumsum_df).fit(smoothing_level=0.3,optimized=False)

fcast1 = fit1.predict(3)

fcast1



2020-01-27       1.810000
2020-01-30       2.467000
2020-01-31       3.826900
2020-02-01       5.978830
2020-02-02       7.785181
2020-02-04       9.949627
2020-02-05      11.764739
2020-02-06      14.535317
2020-02-09      17.374722
2020-02-10      20.262305
2020-02-16      22.583614
2020-02-18      24.808530
2020-02-19      29.065971
2020-02-20      39.846180
2020-02-21      58.792326
2020-02-22     102.054628
2020-02-23     201.038240
2020-02-24     321.026768
2020-02-25     474.318737
2020-02-26     624.523116
2020-02-27     815.166181
2020-02-28    1100.116327
2020-02-29    1470.881429
2020-03-01    1974.317000
2020-03-02    2645.321900
2020-03-03    3295.025330
2020-03-04    3904.617731

什么方法最适合似乎呈指数增长的总和预测？另外，我对 python 的数据科学还很陌生，所以请放轻松。谢谢

Answer 1

指数平滑仅适用于没有任何缺失时间序列值的数据。我将向您展示使用您提到的三种方法预测未来 5 天后的数据：

指数拟合（你猜"seems to be exponentially increasing"）
样条插值
指数平滑

注意：我通过从你的情节中窃取数据获得了你的数据，并将日期保存到 dates 并将数据值保存到 values

import pandas as pd
import numpy as np
from statsmodels.tsa.holtwinters import ExponentialSmoothing
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from scipy.optimize import curve_fit
from scipy.interpolate import splrep, splev

df = pd.DataFrame()
# mdates.date2num allows functions like curve_fit and spline to digest time series data
df['dates'] = mdates.date2num(dates)
df['values'] = values 

# Exponential fit function
def exponential_func(x, a, b, c, d):
    return a*np.exp(b*(x-c))+d

# Spline interpolation
def spline_interp(x, y, x_new):
    tck = splrep(x, y)
    return splev(x_new, tck)

# define forecast timerange (forecasting 5 days into future)
dates_forecast = np.linspace(df['dates'].min(), df['dates'].max() + 5, 100)
dd = mdates.num2date(dates_forecast)

# Doing exponential fit
popt, pcov = curve_fit(exponential_func, df['dates'], df['values'], 
                       p0=(1, 1e-2, df['dates'][0], 1))

# Doing spline interpolation
yy = spline_interp(df['dates'], df['values'], dates_forecast)

到目前为止很简单（mdates.date2num 函数除外）。由于您丢失了数据，因此您必须对实际数据使用样条插值来用插值数据填充缺失的时间点

# Interpolating data for exponential smoothing (no missing data in time series allowed)
df_interp = pd.DataFrame()
df_interp['dates'] = np.arange(dates[0], dates[-1] + 1, dtype='datetime64[D]')
df_interp['values'] = spline_interp(df['dates'], df['values'], 
                                    mdates.date2num(df_interp['dates']))
series_interp = pd.Series(df_interp['values'].values, 
                          pd.date_range(start='2020-01-19', end='2020-03-04', freq='D'))

# Now the exponential smoothing works fine, provide the `trend` argument given your data 
# has a clear (kind of exponential) trend
fit1 = ExponentialSmoothing(series_interp, trend='mul').fit(optimized=True)

您可以绘制这三种方法，看看它们对接下来五天的预测如何

# Plot data
plt.plot(mdates.num2date(df['dates']), df['values'], 'o')
# Plot exponential function fit
plt.plot(dd, exponential_func(dates_forecast, *popt))
# Plot interpolated values
plt.plot(dd, yy)
# Plot Exponential smoothing prediction using function `forecast`
plt.plot(np.concatenate([series_interp.index.values, fit1.forecast(5).index.values]),
     np.concatenate([series_interp.values, fit1.forecast(5).values]))

三种方法的比较表明你选择指数平滑是正确的。它在预测未来五天方面比其他两种方法看起来更好

关于你的其他问题

I don't understand what predict(3) means and why it returns the predicted sum for dates I already have.

ExponentialSmoothing.fit() returns一个statsmodels.tsa.holtwinters.HoltWintersResults Object which has two function you can use fore prediction/forecasting of values: predict and forecast:

predict 对您的数据进行 start 和 end 观察，并将指数平滑模型应用于相应的日期值。要预测未来的值，您必须指定一个 end 参数，该参数在未来

>> fit1.predict(start=np.datetime('2020-03-01'), end=np.datetime64('2020-03-09'))
2020-03-01    4240.649526
2020-03-02    5631.207307
2020-03-03    5508.614325
2020-03-04    5898.717779
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

在您的示例中 predict(3)（等于 predict(start=3) 根据从第三个日期开始的日期预测值，并且没有任何预测。

forecast() 只做预测。您只需传递要预测未来的观察次数。

>> fit1.forecast(5)
2020-03-05    6249.810230
2020-03-06    6767.659081
2020-03-07    7328.416024
2020-03-08    7935.636353
2020-03-09    8593.169945
Freq: D, dtype: float64

由于两个函数都基于相同的 ExponentialSmoothing.fit 模型，因此它们的值对于相同的日期是相等的。

ExponentialSmoothing - 用于此日期图的预测方法是什么？

ExponentialSmoothing - What prediction method to use for this date plot?

python

prediction

data-science