按日期分组和计数 (R)
Grouping and Counting by Dates (R)
我正在使用 R 编程语言。我有一个如下所示的数据框:
startdate <- c('2010-01-01','2010-01-01','2010-01-01', '2010-01-02','2010-01-03','2010-01-03')
event <- c(1,1,1,1,1,1)
my_data <- data.frame(startdate, event)
startdate event
1 2010-01-01 1
2 2010-01-01 1
3 2010-01-01 1
4 2010-01-02 1
5 2010-01-03 1
6 2010-01-03 1
注意:“startdate”的实际值为“POSIXct”,写作“年-月-日”。
我正在尝试根据“开始日期”列计算“事件”的累计总和。结果应该是这样的
startdate <- c('2010-01-01', '2010-01-02' ,'2010-01-03')
event <- c(3,4,6)
my_data_2 <- data.frame(startdate, event)
#desired file
startdate event
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
我试着用“dplyr”库来做到这一点:
library(dplyr)
new_file = my_data %>% group_by(startdate) %>% mutate(cumsum_value = cumsum(event))
但是这个 returns 略有不同且非本意:
startdate event cumsum_value
<chr> <dbl> <dbl>
1 2010-01-01 1 1
2 2010-01-01 1 2
3 2010-01-01 1 3
4 2010-01-02 1 1
5 2010-01-03 1 1
6 2010-01-03 1 2
有人可以告诉我如何解决这个问题吗?
谢谢
my_data %>%
mutate(cumsum = cumsum(event)) %>%
group_by(startdate) %>%
summarise(max(cumsum))
# A tibble: 3 × 2
startdate `max(cumsum)`
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
mutate
event
列并计算 cumsum
group_by
startdate
和
summarise
max(event)
library(dplyr)
my_data %>%
mutate(event = cumsum(event)) %>%
group_by(startdate) %>%
summarise(event = max(event))
```
```
startdate event
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
```
另一个选择也是使用duplicated
,从而避免group_by
。另外,如果 'event' 列只有 1,而不是 cumsum
,我们可以使用内置函数 row_number()
创建一个序列
library(dplyr)
my_data %>%
mutate(event = row_number()) %>%
filter(!duplicated(startdate, fromLast = TRUE))
我正在使用 R 编程语言。我有一个如下所示的数据框:
startdate <- c('2010-01-01','2010-01-01','2010-01-01', '2010-01-02','2010-01-03','2010-01-03')
event <- c(1,1,1,1,1,1)
my_data <- data.frame(startdate, event)
startdate event
1 2010-01-01 1
2 2010-01-01 1
3 2010-01-01 1
4 2010-01-02 1
5 2010-01-03 1
6 2010-01-03 1
注意:“startdate”的实际值为“POSIXct”,写作“年-月-日”。
我正在尝试根据“开始日期”列计算“事件”的累计总和。结果应该是这样的
startdate <- c('2010-01-01', '2010-01-02' ,'2010-01-03')
event <- c(3,4,6)
my_data_2 <- data.frame(startdate, event)
#desired file
startdate event
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
我试着用“dplyr”库来做到这一点:
library(dplyr)
new_file = my_data %>% group_by(startdate) %>% mutate(cumsum_value = cumsum(event))
但是这个 returns 略有不同且非本意:
startdate event cumsum_value
<chr> <dbl> <dbl>
1 2010-01-01 1 1
2 2010-01-01 1 2
3 2010-01-01 1 3
4 2010-01-02 1 1
5 2010-01-03 1 1
6 2010-01-03 1 2
有人可以告诉我如何解决这个问题吗?
谢谢
my_data %>%
mutate(cumsum = cumsum(event)) %>%
group_by(startdate) %>%
summarise(max(cumsum))
# A tibble: 3 × 2
startdate `max(cumsum)`
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
mutate
event
列并计算cumsum
group_by
startdate
和summarise
max(event)
library(dplyr)
my_data %>%
mutate(event = cumsum(event)) %>%
group_by(startdate) %>%
summarise(event = max(event))
```
```
startdate event
<chr> <dbl>
1 2010-01-01 3
2 2010-01-02 4
3 2010-01-03 6
```
另一个选择也是使用duplicated
,从而避免group_by
。另外,如果 'event' 列只有 1,而不是 cumsum
,我们可以使用内置函数 row_number()
创建一个序列
library(dplyr)
my_data %>%
mutate(event = row_number()) %>%
filter(!duplicated(startdate, fromLast = TRUE))