按日期分组和计数 (R)

Grouping and Counting by Dates (R)

我正在使用 R 编程语言。我有一个如下所示的数据框:

  startdate <- c('2010-01-01','2010-01-01','2010-01-01', '2010-01-02','2010-01-03','2010-01-03')

event <- c(1,1,1,1,1,1)
    
 my_data <- data.frame(startdate, event)

   startdate event
1 2010-01-01     1
2 2010-01-01     1
3 2010-01-01     1
4 2010-01-02     1
5 2010-01-03     1
6 2010-01-03     1

注意:“startdate”的实际值为“POSIXct”,写作“年-月-日”。

我正在尝试根据“开始日期”列计算“事件”的累计总和。结果应该是这样的

  startdate <- c('2010-01-01', '2010-01-02' ,'2010-01-03')

event <- c(3,4,6)
    
 my_data_2 <- data.frame(startdate, event)

#desired file
   startdate event
1 2010-01-01     3
2 2010-01-02     4
3 2010-01-03     6

我试着用“dplyr”库来做到这一点:

library(dplyr)

new_file = my_data %>% group_by(startdate) %>% mutate(cumsum_value = cumsum(event))

但是这个 returns 略有不同且非本意:

 startdate  event cumsum_value
  <chr>      <dbl>        <dbl>
1 2010-01-01     1            1
2 2010-01-01     1            2
3 2010-01-01     1            3
4 2010-01-02     1            1
5 2010-01-03     1            1
6 2010-01-03     1            2

有人可以告诉我如何解决这个问题吗?

谢谢

my_data %>%
  mutate(cumsum = cumsum(event)) %>%
  group_by(startdate) %>%
  summarise(max(cumsum))

# A tibble: 3 × 2
  startdate  `max(cumsum)`
  <chr>              <dbl>
1 2010-01-01             3
2 2010-01-02             4
3 2010-01-03             6
  1. mutate event 列并计算 cumsum
  2. group_by startdate
  3. summarise max(event)
library(dplyr)
my_data %>%
    mutate(event = cumsum(event)) %>% 
    group_by(startdate) %>% 
    summarise(event = max(event))
```
```
  startdate  event
  <chr>      <dbl>
1 2010-01-01     3
2 2010-01-02     4
3 2010-01-03     6
```

另一个选择也是使用duplicated,从而避免group_by。另外,如果 'event' 列只有 1,而不是 cumsum,我们可以使用内置函数 row_number() 创建一个序列

library(dplyr)
my_data %>%
   mutate(event = row_number()) %>% 
   filter(!duplicated(startdate, fromLast = TRUE))