使用窗口化数据集预测未来时期的时间序列值

Question

我有 3650 个时间步之前的数据，但我想对未来进行预测，即 3650 个时间步之后的数据。我是机器学习的新手，显然无法弄清楚。我该怎么做？以供参考， Colab Notebook

Answer 1

描述了如何使表格（或横截面）回归算法适应预测问题的一般方法 here。简而言之：您在 windows 的滞后观测值上训练模型。要生成预测，您有不同的选择，最常用的是递归策略，这里您使用最后可用的 window 来预测第一个值，然后用第一个预测值更新最后一个 window 进行预测下一个值等等。

如果您有兴趣，我们正在开发一个工具箱，它可以针对这些用例扩展 scikit-learn。所以对于 sktime，你可以简单地写：

import numpy as np
from sktime.datasets import load_airline
from sktime.forecasting.compose import RecursiveTabularRegressionForecaster
from sklearn.ensemble import RandomForestRegressor
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
y = load_airline()  # load 1-dimensional time series
y_train, y_test = temporal_train_test_split(y)  
fh = np.arange(1, len(y_test) + 1)  # forecasting horizon
regressor = RandomForestRegressor(random_state=3)  
forecaster = RecursiveTabularRegressionForecaster(regressor, window_length=10)
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
print(mean_absolute_percentage_error(y_test, y_pred, symmetric=True))
>>> 0.1440354514063762

使用窗口化数据集预测未来时期的时间序列值

Predicting values in time series for future periods with windowed dataset

machine-learning

time-series

prediction

data-science

tensorflow