如何调整现有代码以在 R 中进行样本外预测

How to adjust existing code to do out of sample prediction in R

我有一个 df 如下:

 Key  Date                     y   x1   x2   x3
   1    1/10/2018 12:00:00 AM    2   3    2    5
   1    1/11/2018 12:00:00 AM    3   5    7    2
   1    1/12/2018 12:00:00 AM    5   7    4    7 
   1    1/13/2018 12:00:00 AM    7   2    7    6
   2    1/10/2018 12:00:00 AM    2   6    3    8
   2    1/11/2018 12:00:00 AM    3   7    7    3
   2    1/12/2018 12:00:00 AM    3   2    3    4
   2    1/13/2018 12:00:00 AM    7   6    2    7

以下代码使我能够 运行 对 "keys" 进行回归,并在我用来进行回归的数据集中生成预测列,请参见下面的代码和示例:

test = df[(df$key==1 | df$key==2),]

df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {

  good_cols=sapply(temp,function(x){
    is.numeric(x) && ((max(x)-min(x))!=0)
  })

  temp=temp[,good_cols]
  fit=lm(y~.,data=temp)
  return(fit)
})


#Credit to MrFlick for reg_predict code below

    reg_predict = dplyr::bind_rows(Map(function(data, model) {
           data.frame(data, pred=predict(model))    }, df_list, reg_results))


df_list_summary = lapply(reg_results, function(model_output){
  broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv") 

代码生成的示例:

 Key  Date                     y   x1   x2   x3  predicted values for each date
       1    1/10/2018 12:00:00 AM    2   3    2    5   ...
       1    1/11/2018 12:00:00 AM    3   5    7    2   ...
       1    1/12/2018 12:00:00 AM    5   7    4    7   ...
       1    1/13/2018 12:00:00 AM    7   2    7    6   ...
       2    1/10/2018 12:00:00 AM    2   6    3    8   ...
       2    1/11/2018 12:00:00 AM    3   7    7    3   ...
       2    1/12/2018 12:00:00 AM    3   2    3    4   ...
       2    1/13/2018 12:00:00 AM    7   6    2    7   ...

是否有任何方法可以调整下面的代码行,以便能够灵活地使用之前代码中内置的回归来预测样本外?我一直在尝试,但没有成功。

我一直在使用这行代码来解决我的问题:

 reg_predict = dplyr::bind_rows(Map(function(data, model) {
           data.frame(data, pred=predict(model))    }, df_list, reg_results))

谢谢你,

oof_predictions_df = data.frame() 

for(i in seq(df_list)){ 
    g <- predict(reg_results2[[i]], df_list[[i]]) 
    f = data.frame(Key = df_list[[i]][,'Key'], Date = df_list[[i]][,'Dates'], NY = df_list[[i]][,'NY'], pred = g) 
    oof_predictions_df <- rbind(oof_predictions_df, f) 

}