在 R 中使用随机森林进行时间序列预测

Question

我正在尝试使用随机森林进行时间序列分析。 PFB 我的代码

Subsales<-read.csv('Sales.csv')
head(Subsales)

示例数据：

Date               SKU                            City   Sales
      <date>                               <chr>   <chr> <dbl>
1 2014-08-11 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   378
2 2014-08-18 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   348
3 2014-08-25 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   314
4 2014-09-01 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   324
5 2014-09-08 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   352
6 2014-09-15 Vaseline Petroleum Jelly Pure 60 ml Jeddah1   453


####Length of training & testing set Splitting it 80-20####

train_len=round(nrow(SubSales)*0.8) 
test_len=nrow(SubSales)



######Splitting dataset into training and testing#####

#### Training Set
training<-slice(SubSales,1:train_len) 
#### Testing Set
testing<-slice(SubSales,train_len+1:test_len)

training=training[c(1,4)]
testing=testing[c(1,4)]

library(randomForest)
set.seed(1234)
regressor = randomForest(formula=Sales~.,
                data=training,
                ntree=100)

y_pred = predict(regressor,newdata = testing)

当我对测试数据使用预测函数时，我得到了一个平稳的结果set.All预测值是 369，我已经尝试了另一个数据集，我得到了相同的结果。谁能告诉我我做错了什么？

Answer 1

让我尝试重新表述您的问题，以确保我准确理解您想要做什么。

您有一个产品每天的销售额，并且您想要预测销售额作为未来日期的函数。您没有任何预测变量，例如客户数量、广告支出金额或其他任何内容。您的输入数据如下所示：

Date        Sales
2014-08-11  378
2014-08-18  348
2014-08-25  314
2014-09-01  324
2014-09-08  352
2014-09-15  453
...

我认为您的 RandomForest 表现符合预期。随机森林是一种受监督的机器学习算法，它试图在给定输入变量 x（预测变量）的情况下预测 y（响应，此处：销售额）。在这里，您唯一提供的 x 是日期。但是，每个日期对于随机森林都是全新的，因此算法只能猜测您的产品当天的销售额是平均的。

你有两个选择：

选项 1) 坚持只使用日期作为预测变量的方法。您将需要一种不同的方法，也许是一种自回归方法，例如 ARIMA。这种方法试图检测数据中的趋势。销售额是或多或少是静止的、增长的还是下降的？是否有每周趋势、每月趋势、年度趋势？可以找到帮助您入门的示例 here

选项 2) 使用数据收集和特征工程创建特征，帮助您的 RandomForest 预测新日期的值。例如，尝试获取有关在任何给定日期有多少顾客来到商店的数据，或者提取星期几（星期一、星期二...）并将其作为单独的变量保存。 R 包 lubridate 将帮助您做到这一点。下面是一个简短的例子：

library(lubridate)
Subsales <- mutate(Subsales, Weekday = wday(Date, label = TRUE))

希望对您有所帮助！

在 R 中使用随机森林进行时间序列预测

Time Series Forecasting using Random Forest in R

r

predict

forecasting

random-forest