运行 按组回归并粘贴到原始数据框中以获得预测值 - 提供了示例代码
Run regression by group and paste into original data frame for predict values - sample code provided
我有一个代码可以生成自动回归。但我正在努力如何实现预测功能以将预测粘贴到每个日期的原始数据集中。
谢谢,
我目前拥有的代码:
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=step(lm(y~.,data=temp))
return(fit)
})
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
样本 df:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
想要的结果:
Key Date y x1 x2 x3 predicted values for each date
1 1/10/2018 12:00:00 AM 2 3 2 5 ...
1 1/11/2018 12:00:00 AM 3 5 7 2 ...
1 1/12/2018 12:00:00 AM 5 7 4 7 ...
1 1/13/2018 12:00:00 AM 7 2 7 6 ...
2 1/10/2018 12:00:00 AM 2 6 3 8 ...
2 1/11/2018 12:00:00 AM 3 7 7 3 ...
2 1/12/2018 12:00:00 AM 3 2 3 4 ...
2 1/13/2018 12:00:00 AM 7 6 2 7 ...
到目前为止我尝试过的方法无济于事:
test2 = df[(df$key==1 | df$key==2),]
unsplit(lapply(split(test, test$key),function(w){
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=lm(y~.,data=temp)
})
cbind(w,predict(fit,subset(df, key=="1" | key=="2")))
}),test$key)
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="key's")
readr::write_csv(final_step2,"test2.csv")
更新:
所以 MrFlick 的代码起作用了;但是,我正在尝试弄清楚如何将代码应用于 out_of_sample_df。有人可以帮忙吗?
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=step(lm(y~.,data=temp))
return(fit)
})
#MrFlicks contribution - need help to adjust this line of code to apply to out of sample data to produce prediction results. Currently this line of code inserts pred column inside original data set.
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
谢谢,
您可以使用 Map()
迭代数据和模型以获得您想要的结果。从原始代码开始,你可以这样做
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model))
}, df_list, reg_results))
我有一个代码可以生成自动回归。但我正在努力如何实现预测功能以将预测粘贴到每个日期的原始数据集中。
谢谢,
我目前拥有的代码:
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=step(lm(y~.,data=temp))
return(fit)
})
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
样本 df:
Key Date y x1 x2 x3
1 1/10/2018 12:00:00 AM 2 3 2 5
1 1/11/2018 12:00:00 AM 3 5 7 2
1 1/12/2018 12:00:00 AM 5 7 4 7
1 1/13/2018 12:00:00 AM 7 2 7 6
2 1/10/2018 12:00:00 AM 2 6 3 8
2 1/11/2018 12:00:00 AM 3 7 7 3
2 1/12/2018 12:00:00 AM 3 2 3 4
2 1/13/2018 12:00:00 AM 7 6 2 7
想要的结果:
Key Date y x1 x2 x3 predicted values for each date
1 1/10/2018 12:00:00 AM 2 3 2 5 ...
1 1/11/2018 12:00:00 AM 3 5 7 2 ...
1 1/12/2018 12:00:00 AM 5 7 4 7 ...
1 1/13/2018 12:00:00 AM 7 2 7 6 ...
2 1/10/2018 12:00:00 AM 2 6 3 8 ...
2 1/11/2018 12:00:00 AM 3 7 7 3 ...
2 1/12/2018 12:00:00 AM 3 2 3 4 ...
2 1/13/2018 12:00:00 AM 7 6 2 7 ...
到目前为止我尝试过的方法无济于事:
test2 = df[(df$key==1 | df$key==2),]
unsplit(lapply(split(test, test$key),function(w){
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=lm(y~.,data=temp)
})
cbind(w,predict(fit,subset(df, key=="1" | key=="2")))
}),test$key)
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="key's")
readr::write_csv(final_step2,"test2.csv")
更新: 所以 MrFlick 的代码起作用了;但是,我正在尝试弄清楚如何将代码应用于 out_of_sample_df。有人可以帮忙吗?
test = df[(df$key==1 | df$key==2),]
df_list=split(test, test$key)
reg_results = lapply(df_list,function(temp) {
good_cols=sapply(temp,function(x){
is.numeric(x) && ((max(x)-min(x))>10000)
})
temp=temp[,good_cols]
fit=step(lm(y~.,data=temp))
return(fit)
})
#MrFlicks contribution - need help to adjust this line of code to apply to out of sample data to produce prediction results. Currently this line of code inserts pred column inside original data set.
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model)) }, df_list, reg_results))
df_list_summary = lapply(reg_results, function(model_output){
broom::tidy(model_output)
})
final_step2 = dplyr::bind_rows(df_list_summary, .id="Key's")
readr::write_csv(final_step2,"test2.csv")
谢谢,
您可以使用 Map()
迭代数据和模型以获得您想要的结果。从原始代码开始,你可以这样做
reg_predict = dplyr::bind_rows(Map(function(data, model) {
data.frame(data, pred=predict(model))
}, df_list, reg_results))