具有条件和组的值计数
Count of values with a condition and group
我有一个数据框,按 groupID
和 date
排序:
d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(1,1,25,1,1,25,1,25,1))
> d1
groupID date value
1 1 1
1 2 1
1 3 25
1 4 1
1 5 1
3 6 25
3 7 1
3 8 25
3 9 1
我想创建两个新列:
- 每出现一次 25,每组之前的值计数=1
- 对于 25 的每一次出现,每个组的下一个值=25 之前的值=1 之后的值=1 的计数
期望的输出:
groupID date value Prev1s After1s
1 1 1
1 2 1
1 3 25 2 2
1 4 1
1 5 1
3 6 25 0 1
3 7 1
3 8 25 1 1
3 9 1
我可以使用 Excel 通过创建一个计数器并获取之前的值来做同样的事情。我曾尝试使用 sum
、shift()
在 R 中实现相同的效果,但没有成功。
您可以使用 dplyr
...
library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
AfterGp=cumsum(value==25)) %>%
#use these to calculate the values you want for each group
group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>%
group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>%
ungroup() %>%
#remove values (set to "") other than for value==25
mutate(Prev1s=replace(Prev1s,value!=25,""),
After1s=replace(After1s,value!=25,"")) %>%
#and remove the grouping variables
select(-c(PrevGp,AfterGp))
d1
# A tibble: 9 x 5
groupID date value Prev1s After1s
<dbl> <dbl> <dbl> <chr> <chr>
1 1 1 1
2 1 2 1
3 1 3 25 2 2
4 1 4 1
5 1 5 1
6 3 6 25 0 1
7 3 7 1
8 3 8 25 1 1
9 3 9 1
将 data.table
包与 rle
函数结合使用的替代方案:
library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
i <- p$values == 25;
p$values[i] <- shift(p$lengths, fill = 0)[i];
a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
p$values[!i] <- a$values[!i] <- NA;
list(inverse.rle(p),inverse.rle(a))},
by = groupID][]
给出:
groupID date value prev1s after1s
1: 1 1 1 NA NA
2: 1 2 1 NA NA
3: 1 3 25 2 2
4: 1 4 1 NA NA
5: 1 5 1 NA NA
6: 3 6 25 0 1
7: 3 7 1 NA NA
8: 3 8 25 1 1
9: 3 9 1 NA NA
我有一个数据框,按 groupID
和 date
排序:
d1 <- data.frame(groupID = c(1,1,1,1,1,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(1,1,25,1,1,25,1,25,1))
> d1
groupID date value
1 1 1
1 2 1
1 3 25
1 4 1
1 5 1
3 6 25
3 7 1
3 8 25
3 9 1
我想创建两个新列:
- 每出现一次 25,每组之前的值计数=1
- 对于 25 的每一次出现,每个组的下一个值=25 之前的值=1 之后的值=1 的计数
期望的输出:
groupID date value Prev1s After1s
1 1 1
1 2 1
1 3 25 2 2
1 4 1
1 5 1
3 6 25 0 1
3 7 1
3 8 25 1 1
3 9 1
我可以使用 Excel 通过创建一个计数器并获取之前的值来做同样的事情。我曾尝试使用 sum
、shift()
在 R 中实现相同的效果,但没有成功。
您可以使用 dplyr
...
library(dplyr)
#first set up some grouping variables based on runs before and after 25s
d1 <- d1 %>% mutate(PrevGp=cumsum(lag(value==25,default = 1)),
AfterGp=cumsum(value==25)) %>%
#use these to calculate the values you want for each group
group_by(groupID,PrevGp) %>% mutate(Prev1s=sum(value)-25) %>%
group_by(groupID,AfterGp) %>% mutate(After1s=sum(value)-25) %>%
ungroup() %>%
#remove values (set to "") other than for value==25
mutate(Prev1s=replace(Prev1s,value!=25,""),
After1s=replace(After1s,value!=25,"")) %>%
#and remove the grouping variables
select(-c(PrevGp,AfterGp))
d1
# A tibble: 9 x 5
groupID date value Prev1s After1s
<dbl> <dbl> <dbl> <chr> <chr>
1 1 1 1
2 1 2 1
3 1 3 25 2 2
4 1 4 1
5 1 5 1
6 3 6 25 0 1
7 3 7 1
8 3 8 25 1 1
9 3 9 1
将 data.table
包与 rle
函数结合使用的替代方案:
library(data.table)
setDT(d1)[, c('prev1s','after1s') := {p <- a <- rle(value);
i <- p$values == 25;
p$values[i] <- shift(p$lengths, fill = 0)[i];
a$values[i] <- shift(a$lengths, type = 'lead', fill = 0)[i];
p$values[!i] <- a$values[!i] <- NA;
list(inverse.rle(p),inverse.rle(a))},
by = groupID][]
给出:
groupID date value prev1s after1s 1: 1 1 1 NA NA 2: 1 2 1 NA NA 3: 1 3 25 2 2 4: 1 4 1 NA NA 5: 1 5 1 NA NA 6: 3 6 25 0 1 7: 3 7 1 NA NA 8: 3 8 25 1 1 9: 3 9 1 NA NA