如何调整现有代码以在 R 中进行样本外预测
How to adjust existing code to do out of sample prediction in R
我有一个 df 如下:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
以下代码使我能够 运行 对 "keys" 进行回归,并在我用来进行回归的数据集中生成预测列,请参见下面的代码和示例:
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))!=0)
})
temp=temp[,good_cols]
fit=lm(y~.,data=temp)
return(fit)
})
#Credit to MrFlick for reg_predict code below
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
代码生成的示例:
Key Date y x1 x2 x3 predicted values for each date
1 1/10/2018 12:00:00 AM 2 3 2 5 ...
1 1/11/2018 12:00:00 AM 3 5 7 2 ...
1 1/12/2018 12:00:00 AM 5 7 4 7 ...
1 1/13/2018 12:00:00 AM 7 2 7 6 ...
2 1/10/2018 12:00:00 AM 2 6 3 8 ...
2 1/11/2018 12:00:00 AM 3 7 7 3 ...
2 1/12/2018 12:00:00 AM 3 2 3 4 ...
2 1/13/2018 12:00:00 AM 7 6 2 7 ...
是否有任何方法可以调整下面的代码行,以便能够灵活地使用之前代码中内置的回归来预测样本外?我一直在尝试,但没有成功。
我一直在使用这行代码来解决我的问题:
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
谢谢你,
oof_predictions_df = data.frame()
for(i in seq(df_list)){
g <- predict(reg_results2[[i]], df_list[[i]])
f = data.frame(Key = df_list[[i]][,'Key'], Date = df_list[[i]][,'Dates'], NY = df_list[[i]][,'NY'], pred = g)
oof_predictions_df <- rbind(oof_predictions_df, f)
}
我有一个 df 如下:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
以下代码使我能够 运行 对 "keys" 进行回归,并在我用来进行回归的数据集中生成预测列,请参见下面的代码和示例:
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))!=0)
})
temp=temp[,good_cols]
fit=lm(y~.,data=temp)
return(fit)
})
#Credit to MrFlick for reg_predict code below
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
代码生成的示例:
Key Date y x1 x2 x3 predicted values for each date
1 1/10/2018 12:00:00 AM 2 3 2 5 ...
1 1/11/2018 12:00:00 AM 3 5 7 2 ...
1 1/12/2018 12:00:00 AM 5 7 4 7 ...
1 1/13/2018 12:00:00 AM 7 2 7 6 ...
2 1/10/2018 12:00:00 AM 2 6 3 8 ...
2 1/11/2018 12:00:00 AM 3 7 7 3 ...
2 1/12/2018 12:00:00 AM 3 2 3 4 ...
2 1/13/2018 12:00:00 AM 7 6 2 7 ...
是否有任何方法可以调整下面的代码行,以便能够灵活地使用之前代码中内置的回归来预测样本外?我一直在尝试,但没有成功。
我一直在使用这行代码来解决我的问题:
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
谢谢你,
oof_predictions_df = data.frame()
for(i in seq(df_list)){
g <- predict(reg_results2[[i]], df_list[[i]])
f = data.frame(Key = df_list[[i]][,'Key'], Date = df_list[[i]][,'Dates'], NY = df_list[[i]][,'NY'], pred = g)
oof_predictions_df <- rbind(oof_predictions_df, f)
}