是否可以用 dplyr 计算条件 cumsum

Is it possible to calculate a conditional cumsum with dplyr

我正在尝试计算一名球员在白天与夜间比赛中获得的历史命中数。例如,假设一名玩家有 5 场比赛,从最旧的比赛到最近的比赛排序,第一行的 dn_hits 列(昼夜)将为零,因为这是第一场比赛。第二行的 dn_hits 将查看第二场比赛是白天比赛还是夜间比赛,然后在 hits 列上执行向后看的 cumsum() - 总结白天或晚上发生的所有命中视情况可以是。组中的每一行都会发生这种情况。

下面有一个示例数据框和示例输出。我还包含了一些我认为需要做的计算的伪代码

您可以在下面的输出中看到:

第 1 行:球员 AJ 第一行的 dn_hits 列为 0(之前没有比赛或命中 cumsum);

第 2 行:玩家 AJ 的第二行是 2(AJ 的第二场比赛是日间比赛,他的第一场比赛也是如此。因此,我们有条件地将第一场比赛的命中数累加= 2 其中 dn = "day");

第 3 行:球员 AJ 的第三行是 0(第三场比赛是夜场比赛,之前比赛 3 球员 AJ 只参加了 (2) 场日场比赛,所以dn = "night" 为 0)

的条件命中总和

这可以用 dplyr 完成吗?还是 purrr 的工作。

 library(tidyverse)

df <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"))


output  <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"), dn_hits=c(0,2,0,0,1,0,0,2,6,1)) 

# Original tibble

df
#> # A tibble: 10 x 4
#>     game name   hits dn   
#>    <int> <chr> <dbl> <chr>
#>  1     1 AJ        2 Day  
#>  2     2 AJ        1 Day  
#>  3     3 AJ        0 Night
#>  4     4 AJ        1 Night
#>  5     5 AJ        3 Night
#>  6     1 CJ        2 Night
#>  7     2 CJ        1 Day  
#>  8     3 CJ        4 Night
#>  9     4 CJ        1 Night
#> 10     5 CJ        0 Day

# Desired Output

output
#> # A tibble: 10 x 5
#>     game name   hits dn    dn_hits
#>    <int> <chr> <dbl> <chr>   <dbl>
#>  1     1 AJ        2 Day         0
#>  2     2 AJ        1 Day         2
#>  3     3 AJ        0 Night       0
#>  4     4 AJ        1 Night       0
#>  5     5 AJ        3 Night       1
#>  6     1 CJ        2 Night       0
#>  7     2 CJ        1 Day         0
#>  8     3 CJ        4 Night       2
#>  9     4 CJ        1 Night       6
#> 10     5 CJ        0 Day         1


# This is what I think needs to happen but not sure how to implement it. 

#df %>% 
#group_by(name) %>%
#arrange(name, desc(game)) %>% 
#mutate(dn_hits = cumsum(dn = [dn on current row],hits, 0))

编辑:我还尝试了以下方法:

df %>% 
group_by(name) %>%
arrange(name, desc(game)) %>% 
mutate(dn_hits = map_int(dn, ~ cumsum(if_else(.x == dn, hits, 0))))

但是我得到以下错误:

Error: Problem with `mutate()` input `dn_hits`. x `false` must be a double vector, not an integer vector. i Input `dn_hits` is `map_int(dn, ~cumsum(if_else(.x == dn, hits, 0L)))`. i The error occurred in group 1: name = "AJ".

您只需 group_by 乘以 dn 即可得到 'conditional' 累计总和:

df %>% 
  group_by(name, dn) %>% 
  mutate(dn_hits = cumsum(hits)-hits)