是否可以用 dplyr 计算条件 cumsum
Is it possible to calculate a conditional cumsum with dplyr
我正在尝试计算一名球员在白天与夜间比赛中获得的历史命中数。例如,假设一名玩家有 5 场比赛,从最旧的比赛到最近的比赛排序,第一行的 dn_hits 列(昼夜)将为零,因为这是第一场比赛。第二行的 dn_hits 将查看第二场比赛是白天比赛还是夜间比赛,然后在 hits 列上执行向后看的 cumsum() - 总结白天或晚上发生的所有命中视情况可以是。组中的每一行都会发生这种情况。
下面有一个示例数据框和示例输出。我还包含了一些我认为需要做的计算的伪代码
您可以在下面的输出中看到:
第 1 行:球员 AJ 第一行的 dn_hits 列为 0(之前没有比赛或命中 cumsum);
第 2 行:玩家 AJ 的第二行是 2(AJ 的第二场比赛是日间比赛,他的第一场比赛也是如此。因此,我们有条件地将第一场比赛的命中数累加= 2 其中 dn = "day");
第 3 行:球员 AJ 的第三行是 0(第三场比赛是夜场比赛,之前比赛 3 球员 AJ 只参加了 (2) 场日场比赛,所以dn = "night" 为 0)
的条件命中总和
这可以用 dplyr 完成吗?还是 purrr 的工作。
library(tidyverse)
df <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"))
output <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"), dn_hits=c(0,2,0,0,1,0,0,2,6,1))
# Original tibble
df
#> # A tibble: 10 x 4
#> game name hits dn
#> <int> <chr> <dbl> <chr>
#> 1 1 AJ 2 Day
#> 2 2 AJ 1 Day
#> 3 3 AJ 0 Night
#> 4 4 AJ 1 Night
#> 5 5 AJ 3 Night
#> 6 1 CJ 2 Night
#> 7 2 CJ 1 Day
#> 8 3 CJ 4 Night
#> 9 4 CJ 1 Night
#> 10 5 CJ 0 Day
# Desired Output
output
#> # A tibble: 10 x 5
#> game name hits dn dn_hits
#> <int> <chr> <dbl> <chr> <dbl>
#> 1 1 AJ 2 Day 0
#> 2 2 AJ 1 Day 2
#> 3 3 AJ 0 Night 0
#> 4 4 AJ 1 Night 0
#> 5 5 AJ 3 Night 1
#> 6 1 CJ 2 Night 0
#> 7 2 CJ 1 Day 0
#> 8 3 CJ 4 Night 2
#> 9 4 CJ 1 Night 6
#> 10 5 CJ 0 Day 1
# This is what I think needs to happen but not sure how to implement it.
#df %>%
#group_by(name) %>%
#arrange(name, desc(game)) %>%
#mutate(dn_hits = cumsum(dn = [dn on current row],hits, 0))
编辑:我还尝试了以下方法:
df %>%
group_by(name) %>%
arrange(name, desc(game)) %>%
mutate(dn_hits = map_int(dn, ~ cumsum(if_else(.x == dn, hits, 0))))
但是我得到以下错误:
Error: Problem with `mutate()` input `dn_hits`. x `false` must be a double vector, not an integer vector. i Input `dn_hits` is `map_int(dn, ~cumsum(if_else(.x == dn, hits, 0L)))`. i The error occurred in group 1: name = "AJ".
您只需 group_by
乘以 dn
即可得到 'conditional' 累计总和:
df %>%
group_by(name, dn) %>%
mutate(dn_hits = cumsum(hits)-hits)
我正在尝试计算一名球员在白天与夜间比赛中获得的历史命中数。例如,假设一名玩家有 5 场比赛,从最旧的比赛到最近的比赛排序,第一行的 dn_hits 列(昼夜)将为零,因为这是第一场比赛。第二行的 dn_hits 将查看第二场比赛是白天比赛还是夜间比赛,然后在 hits 列上执行向后看的 cumsum() - 总结白天或晚上发生的所有命中视情况可以是。组中的每一行都会发生这种情况。
下面有一个示例数据框和示例输出。我还包含了一些我认为需要做的计算的伪代码
您可以在下面的输出中看到:
第 1 行:球员 AJ 第一行的 dn_hits 列为 0(之前没有比赛或命中 cumsum);
第 2 行:玩家 AJ 的第二行是 2(AJ 的第二场比赛是日间比赛,他的第一场比赛也是如此。因此,我们有条件地将第一场比赛的命中数累加= 2 其中 dn = "day");
第 3 行:球员 AJ 的第三行是 0(第三场比赛是夜场比赛,之前比赛 3 球员 AJ 只参加了 (2) 场日场比赛,所以dn = "night" 为 0)
的条件命中总和这可以用 dplyr 完成吗?还是 purrr 的工作。
library(tidyverse)
df <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"))
output <- tibble(game=c(seq(1:5),seq(1,5)),name=c("AJ","AJ","AJ","AJ","AJ","CJ","CJ","CJ","CJ","CJ"),
hits = c(2,1,0,1,3,2,1,4,1,0), dn=c("Day","Day","Night","Night","Night","Night","Day","Night","Night","Day"), dn_hits=c(0,2,0,0,1,0,0,2,6,1))
# Original tibble
df
#> # A tibble: 10 x 4
#> game name hits dn
#> <int> <chr> <dbl> <chr>
#> 1 1 AJ 2 Day
#> 2 2 AJ 1 Day
#> 3 3 AJ 0 Night
#> 4 4 AJ 1 Night
#> 5 5 AJ 3 Night
#> 6 1 CJ 2 Night
#> 7 2 CJ 1 Day
#> 8 3 CJ 4 Night
#> 9 4 CJ 1 Night
#> 10 5 CJ 0 Day
# Desired Output
output
#> # A tibble: 10 x 5
#> game name hits dn dn_hits
#> <int> <chr> <dbl> <chr> <dbl>
#> 1 1 AJ 2 Day 0
#> 2 2 AJ 1 Day 2
#> 3 3 AJ 0 Night 0
#> 4 4 AJ 1 Night 0
#> 5 5 AJ 3 Night 1
#> 6 1 CJ 2 Night 0
#> 7 2 CJ 1 Day 0
#> 8 3 CJ 4 Night 2
#> 9 4 CJ 1 Night 6
#> 10 5 CJ 0 Day 1
# This is what I think needs to happen but not sure how to implement it.
#df %>%
#group_by(name) %>%
#arrange(name, desc(game)) %>%
#mutate(dn_hits = cumsum(dn = [dn on current row],hits, 0))
编辑:我还尝试了以下方法:
df %>%
group_by(name) %>%
arrange(name, desc(game)) %>%
mutate(dn_hits = map_int(dn, ~ cumsum(if_else(.x == dn, hits, 0))))
但是我得到以下错误:
Error: Problem with `mutate()` input `dn_hits`. x `false` must be a double vector, not an integer vector. i Input `dn_hits` is `map_int(dn, ~cumsum(if_else(.x == dn, hits, 0L)))`. i The error occurred in group 1: name = "AJ".
您只需 group_by
乘以 dn
即可得到 'conditional' 累计总和:
df %>%
group_by(name, dn) %>%
mutate(dn_hits = cumsum(hits)-hits)