purrr::accumulate() 在两个累积变量上,而不仅仅是 1

purrr::accumulate() on two cumulated variables not just 1

我有一个模型,作为预测器具有先前的预测。例如target ~ lag(target prediction)

使用 purrr::accumulate 我可以编写一个自定义函数来进行预测。一些愚蠢的数据示例和一个愚蠢的模型说明:

 ### A model that uses a lag prediction as a predictor using purrr::accumulate() ###
my_diamonds <- diamonds %>% 
  group_by(cut) %>% 
  mutate(cumprice = cumsum(price)) %>% # cumulative within groups
  mutate(lag_cumprice = lag(cumprice)) %>% 
  mutate(InitialValue = min(cumprice)) %>% 
  filter(!is.na(lag_cumprice)) %>% 
  select(cut, cumprice, lag_cumprice, x, InitialValue)

silly_model <- glm(formula = cumprice ~ x + lag_cumprice, family = 'poisson', data = my_diamonds)

此模型使用上一个预测作为下一个预测的输入。我能够编写自定义函数来改变预测:

# when predicting won't have lag_cumprice, instead the result of the previous pediction should be an input to the model:
accPrice <- function(mod, acc, cur) {
  
  db=cur_data_all() # grouped data segment
  x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
  
  total_exponent <- mod$coefficients['(Intercept)'] + 
    (mod$coefficients['x'] * x) +
    (mod$coefficients['lag_cumprice'] * acc) # acc is the accumulated prediction for cumprice
}

# now predict
my_diamonds <- my_diamonds %>% 
  mutate(predicted = accumulate(.x = row_number()[-1], .init = InitialValue %>% unique, .f = accPrice, mod = silly_model))

到目前为止一切顺利。在这个例子中,我使用之前的预测 acc 作为输入。

但是,我创建了一个变异模型,现在使用两个滞后变量作为预测变量:

### now a model with lag on two variables not just one ###
my_diamonds2 <- diamonds %>% 
  group_by(cut) %>% 
  mutate(cumprice = cumsum(price)) %>% # cumulative within groups
  mutate(lag_cumprice = lag(cumprice)) %>% 
  mutate(InitialValue = min(cumprice)) %>% 
  mutate(rn = row_number()) %>% 
  mutate(cumrn = cumsum(rn)) %>% 
  mutate(lag_cumrn = lag(cumrn)) %>% 
  filter(!is.na(lag_cumprice)) %>% 
  select(cut, cumprice, lag_cumprice, lag_cumrn, x, InitialValue)

silly_model2 <- glm(formula = cumprice ~ x + lag_cumprice + lag_cumrn, family = 'poisson', data = my_diamonds2)

### Stuck after here ###

我如何修改上面的函数 accPrice() 以累积 2 个变量,lag_cumprice 和 lag_cumrn 而不是像以前那样只是 lag_cumprice?

我们可以给函数添加一个参数。然后,从模型中提取对应的系数并乘以

accPrice2 <- function(mod, acc, acc2, cur) {
   
   db=cur_data_all() # grouped data segment
   x = db$x[cur] # cur is the current row in the data, use it to get 'this' iterations value of x
   
   total_exponent <- mod$coefficients['(Intercept)'] + 
     (mod$coefficients['x'] * x) +
     (mod$coefficients['lag_cumprice'] * acc)  + 
    (mod$coefficients['lag_cumrn'] * acc2)
 }

my_diamonds2 %>% 
   mutate(predicted = accumulate(.x = row_number()[-1], 
      .init = InitialValue %>%
              unique, .f = accPrice2, mod = silly_model))