purrr::accumulate() 在两个累积变量上,而不仅仅是 1
purrr::accumulate() on two cumulated variables not just 1
我有一个模型,作为预测器具有先前的预测。例如target ~ lag(target prediction)
使用 purrr::accumulate 我可以编写一个自定义函数来进行预测。一些愚蠢的数据示例和一个愚蠢的模型说明:
### A model that uses a lag prediction as a predictor using purrr::accumulate() ###
my_diamonds <- diamonds %>%
group_by(cut) %>%
mutate(cumprice = cumsum(price)) %>% # cumulative within groups
mutate(lag_cumprice = lag(cumprice)) %>%
mutate(InitialValue = min(cumprice)) %>%
filter(!is.na(lag_cumprice)) %>%
select(cut, cumprice, lag_cumprice, x, InitialValue)
silly_model <- glm(formula = cumprice ~ x + lag_cumprice, family = 'poisson', data = my_diamonds)
此模型使用上一个预测作为下一个预测的输入。我能够编写自定义函数来改变预测:
# when predicting won't have lag_cumprice, instead the result of the previous pediction should be an input to the model:
accPrice <- function(mod, acc, cur) {
db=cur_data_all() # grouped data segment
x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
total_exponent <- mod$coefficients['(Intercept)'] +
(mod$coefficients['x'] * x) +
(mod$coefficients['lag_cumprice'] * acc) # acc is the accumulated prediction for cumprice
}
# now predict
my_diamonds <- my_diamonds %>%
mutate(predicted = accumulate(.x = row_number()[-1], .init = InitialValue %>% unique, .f = accPrice, mod = silly_model))
到目前为止一切顺利。在这个例子中,我使用之前的预测 acc
作为输入。
但是,我创建了一个变异模型,现在使用两个滞后变量作为预测变量:
### now a model with lag on two variables not just one ###
my_diamonds2 <- diamonds %>%
group_by(cut) %>%
mutate(cumprice = cumsum(price)) %>% # cumulative within groups
mutate(lag_cumprice = lag(cumprice)) %>%
mutate(InitialValue = min(cumprice)) %>%
mutate(rn = row_number()) %>%
mutate(cumrn = cumsum(rn)) %>%
mutate(lag_cumrn = lag(cumrn)) %>%
filter(!is.na(lag_cumprice)) %>%
select(cut, cumprice, lag_cumprice, lag_cumrn, x, InitialValue)
silly_model2 <- glm(formula = cumprice ~ x + lag_cumprice + lag_cumrn, family = 'poisson', data = my_diamonds2)
### Stuck after here ###
我如何修改上面的函数 accPrice() 以累积 2 个变量,lag_cumprice 和 lag_cumrn 而不是像以前那样只是 lag_cumprice?
我们可以给函数添加一个参数。然后,从模型中提取对应的系数并乘以
accPrice2 <- function(mod, acc, acc2, cur) {
db=cur_data_all() # grouped data segment
x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
total_exponent <- mod$coefficients['(Intercept)'] +
(mod$coefficients['x'] * x) +
(mod$coefficients['lag_cumprice'] * acc) +
(mod$coefficients['lag_cumrn'] * acc2)
}
my_diamonds2 %>%
mutate(predicted = accumulate(.x = row_number()[-1],
.init = InitialValue %>%
unique, .f = accPrice2, mod = silly_model))
我有一个模型,作为预测器具有先前的预测。例如target ~ lag(target prediction)
使用 purrr::accumulate 我可以编写一个自定义函数来进行预测。一些愚蠢的数据示例和一个愚蠢的模型说明:
### A model that uses a lag prediction as a predictor using purrr::accumulate() ###
my_diamonds <- diamonds %>%
group_by(cut) %>%
mutate(cumprice = cumsum(price)) %>% # cumulative within groups
mutate(lag_cumprice = lag(cumprice)) %>%
mutate(InitialValue = min(cumprice)) %>%
filter(!is.na(lag_cumprice)) %>%
select(cut, cumprice, lag_cumprice, x, InitialValue)
silly_model <- glm(formula = cumprice ~ x + lag_cumprice, family = 'poisson', data = my_diamonds)
此模型使用上一个预测作为下一个预测的输入。我能够编写自定义函数来改变预测:
# when predicting won't have lag_cumprice, instead the result of the previous pediction should be an input to the model:
accPrice <- function(mod, acc, cur) {
db=cur_data_all() # grouped data segment
x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
total_exponent <- mod$coefficients['(Intercept)'] +
(mod$coefficients['x'] * x) +
(mod$coefficients['lag_cumprice'] * acc) # acc is the accumulated prediction for cumprice
}
# now predict
my_diamonds <- my_diamonds %>%
mutate(predicted = accumulate(.x = row_number()[-1], .init = InitialValue %>% unique, .f = accPrice, mod = silly_model))
到目前为止一切顺利。在这个例子中,我使用之前的预测 acc
作为输入。
但是,我创建了一个变异模型,现在使用两个滞后变量作为预测变量:
### now a model with lag on two variables not just one ###
my_diamonds2 <- diamonds %>%
group_by(cut) %>%
mutate(cumprice = cumsum(price)) %>% # cumulative within groups
mutate(lag_cumprice = lag(cumprice)) %>%
mutate(InitialValue = min(cumprice)) %>%
mutate(rn = row_number()) %>%
mutate(cumrn = cumsum(rn)) %>%
mutate(lag_cumrn = lag(cumrn)) %>%
filter(!is.na(lag_cumprice)) %>%
select(cut, cumprice, lag_cumprice, lag_cumrn, x, InitialValue)
silly_model2 <- glm(formula = cumprice ~ x + lag_cumprice + lag_cumrn, family = 'poisson', data = my_diamonds2)
### Stuck after here ###
我如何修改上面的函数 accPrice() 以累积 2 个变量,lag_cumprice 和 lag_cumrn 而不是像以前那样只是 lag_cumprice?
我们可以给函数添加一个参数。然后,从模型中提取对应的系数并乘以
accPrice2 <- function(mod, acc, acc2, cur) {
db=cur_data_all() # grouped data segment
x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
total_exponent <- mod$coefficients['(Intercept)'] +
(mod$coefficients['x'] * x) +
(mod$coefficients['lag_cumprice'] * acc) +
(mod$coefficients['lag_cumrn'] * acc2)
}
my_diamonds2 %>%
mutate(predicted = accumulate(.x = row_number()[-1],
.init = InitialValue %>%
unique, .f = accPrice2, mod = silly_model))