具有条件和组的值计数

Count of values with a condition and group

我有一个数据框,按 groupIDdate 排序:

d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3), 
                 date = c(1,2,3,4,5,6,7,8,9),
                 value = c(1,1,25,1,1,25,1,25,1))

> d1
 groupID date value
       1    1     1
       1    2     1
       1    3    25
       1    4     1
       1    5     1
       3    6    25
       3    7     1
       3    8    25
       3    9     1  

我想创建两个新列:

  1. 每出现一次 25,每组之前的值计数=1
  2. 对于 25 的每一次出现,每个组的下一个值=25 之前的值=1 之后的值=1 的计数

期望的输出:

 groupID date value Prev1s After1s
       1    1     1
       1    2     1
       1    3    25      2       2
       1    4     1
       1    5     1
       3    6    25      0       1
       3    7     1
       3    8    25      1       1
       3    9     1

我可以使用 Excel 通过创建一个计数器并获取之前的值来做同样的事情。我曾尝试使用 sumshift() 在 R 中实现相同的效果,但没有成功。

您可以使用 dplyr...

library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
                    AfterGp=cumsum(value==25)) %>% 
#use these to calculate the values you want for each group
  group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>% 
  group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>% 
  ungroup() %>% 
#remove values (set to "") other than for value==25
  mutate(Prev1s=replace(Prev1s,value!=25,""),
         After1s=replace(After1s,value!=25,"")) %>% 
#and remove the grouping variables
  select(-c(PrevGp,AfterGp))

d1
# A tibble: 9 x 5
  groupID  date value Prev1s After1s
    <dbl> <dbl> <dbl>  <chr>   <chr>
1       1     1     1               
2       1     2     1               
3       1     3    25      2       2
4       1     4     1               
5       1     5     1               
6       3     6    25      0       1
7       3     7     1               
8       3     8    25      1       1
9       3     9     1               

data.table 包与 rle 函数结合使用的替代方案:

library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
                                      i <- p$values == 25;
                                      p$values[i] <- shift(p$lengths, fill = 0)[i];
                                      a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
                                      p$values[!i] <- a$values[!i] <- NA;
                                      list(inverse.rle(p),inverse.rle(a))},
          by = groupID][]

给出:

   groupID date value prev1s after1s
1:       1    1     1     NA      NA
2:       1    2     1     NA      NA
3:       1    3    25      2       2
4:       1    4     1     NA      NA
5:       1    5     1     NA      NA
6:       3    6    25      0       1
7:       3    7     1     NA      NA
8:       3    8    25      1       1
9:       3    9     1     NA      NA