按 ID 和日期发生的某些事件
Occurrence of some event by ID and Date
我有一个如下所示的数据框:
<ID> <Event> <Date>
1 Ate 2021-01-01
1 Drank 2021-01-01
1 Ate 2021-02-23
2 Ate 2021-01-02
2 Ran 2021-01-02
2 Ate 2021-02-23
3 Drank 2021-01-01
3 Ran 2021-02-23
我希望确定的是在某个日期是否为每组ID发生了事件。在这种情况下,我想确定每组日期中哪个 ID“吃了”,时间段。
预期的结果是 table,看起来像:
<ID> <Event> <Date> <Outcome>
1 Ate 2021-01-01 Yes
1 Drank 2021-01-01 Yes
1 Jumped 2021-02-23 No
2 Ate 2021-01-02 Yes
2 Ran 2021-01-02 Yes
2 Ate 2021-02-23 No
3 Drank 2021-01-01 No
3 Ran 2021-02-23 No
我希望这是有道理的,谢谢你的帮助!
如果我们要检查 'Ate' 是否与每个 'ID'、'Date' 的另一个 'Event' 一起出现,请按 'ID'、'Date',检查行数(n()
)是否大于1并且找到(&
)'Ate' %in%
'Event'
library(dplyr)
df1 %>%
group_by(ID, Date) %>%
mutate(Outcome = c("No", "Yes")[(n() > 1 & 'Ate' %in% Event) + 1]) %>%
ungroup
-输出
# A tibble: 8 x 4
# ID Event Date Outcome
# <int> <chr> <chr> <chr>
#1 1 Ate 2021-01-01 Yes
#2 1 Drank 2021-01-01 Yes
#3 1 Ate 2021-02-23 No
#4 2 Ate 2021-01-02 Yes
#5 2 Ran 2021-01-02 Yes
#6 2 Ate 2021-02-23 No
#7 3 Drank 2021-01-01 No
#8 3 Ran 2021-02-23 No
为了检查每组有 duplicate
'Ate' 而没有其他值的情况,我们可以使用 n_distinct
(而不是 n()
)即检查'Event' 的不同元素的数量大于 1
df1 %>%
group_by(ID, Date) %>%
mutate(Outcome = c("No", "Yes")[n_distinct(Event) > 1 &
'Ate' %in% Event) + 1]) %>%
ungroup
数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Event = c("Ate",
"Drank", "Ate", "Ate", "Ran", "Ate", "Drank", "Ran"), Date = c("2021-01-01",
"2021-01-01", "2021-02-23", "2021-01-02", "2021-01-02", "2021-02-23",
"2021-01-01", "2021-02-23")), class = "data.frame", row.names = c(NA,
-8L))
这里有一个data.table
选项
setDT(df)[,Outcome := c("No","Yes")[1+isTRUE(Date==Date[Event=="Ate"] & .N>1)],.(ID,Date)]
给予
ID Event Date Outcome
1: 1 Ate 2021-01-01 Yes
2: 1 Drank 2021-01-01 Yes
3: 1 Ate 2021-02-23 No
4: 2 Ate 2021-01-02 Yes
5: 2 Ran 2021-01-02 Yes
6: 2 Ate 2021-02-23 No
7: 3 Drank 2021-01-01 No
8: 3 Ran 2021-02-23 No
我有一个如下所示的数据框:
<ID> <Event> <Date>
1 Ate 2021-01-01
1 Drank 2021-01-01
1 Ate 2021-02-23
2 Ate 2021-01-02
2 Ran 2021-01-02
2 Ate 2021-02-23
3 Drank 2021-01-01
3 Ran 2021-02-23
我希望确定的是在某个日期是否为每组ID发生了事件。在这种情况下,我想确定每组日期中哪个 ID“吃了”,时间段。
预期的结果是 table,看起来像:
<ID> <Event> <Date> <Outcome>
1 Ate 2021-01-01 Yes
1 Drank 2021-01-01 Yes
1 Jumped 2021-02-23 No
2 Ate 2021-01-02 Yes
2 Ran 2021-01-02 Yes
2 Ate 2021-02-23 No
3 Drank 2021-01-01 No
3 Ran 2021-02-23 No
我希望这是有道理的,谢谢你的帮助!
如果我们要检查 'Ate' 是否与每个 'ID'、'Date' 的另一个 'Event' 一起出现,请按 'ID'、'Date',检查行数(n()
)是否大于1并且找到(&
)'Ate' %in%
'Event'
library(dplyr)
df1 %>%
group_by(ID, Date) %>%
mutate(Outcome = c("No", "Yes")[(n() > 1 & 'Ate' %in% Event) + 1]) %>%
ungroup
-输出
# A tibble: 8 x 4
# ID Event Date Outcome
# <int> <chr> <chr> <chr>
#1 1 Ate 2021-01-01 Yes
#2 1 Drank 2021-01-01 Yes
#3 1 Ate 2021-02-23 No
#4 2 Ate 2021-01-02 Yes
#5 2 Ran 2021-01-02 Yes
#6 2 Ate 2021-02-23 No
#7 3 Drank 2021-01-01 No
#8 3 Ran 2021-02-23 No
为了检查每组有 duplicate
'Ate' 而没有其他值的情况,我们可以使用 n_distinct
(而不是 n()
)即检查'Event' 的不同元素的数量大于 1
df1 %>%
group_by(ID, Date) %>%
mutate(Outcome = c("No", "Yes")[n_distinct(Event) > 1 &
'Ate' %in% Event) + 1]) %>%
ungroup
数据
df1 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L), Event = c("Ate",
"Drank", "Ate", "Ate", "Ran", "Ate", "Drank", "Ran"), Date = c("2021-01-01",
"2021-01-01", "2021-02-23", "2021-01-02", "2021-01-02", "2021-02-23",
"2021-01-01", "2021-02-23")), class = "data.frame", row.names = c(NA,
-8L))
这里有一个data.table
选项
setDT(df)[,Outcome := c("No","Yes")[1+isTRUE(Date==Date[Event=="Ate"] & .N>1)],.(ID,Date)]
给予
ID Event Date Outcome
1: 1 Ate 2021-01-01 Yes
2: 1 Drank 2021-01-01 Yes
3: 1 Ate 2021-02-23 No
4: 2 Ate 2021-01-02 Yes
5: 2 Ran 2021-01-02 Yes
6: 2 Ate 2021-02-23 No
7: 3 Drank 2021-01-01 No
8: 3 Ran 2021-02-23 No