在考虑同一数据帧的行之间的时间间隔的数据框中添加条件变量
Add a conditional variable in a data frame that takes into account the time intervals between rows for the same data frame
我有一个数据框 df1
,它以一小时为间隔总结了某个地方出现动物的次数。
举个例子:
df1<- data.frame(DateTime=c("2016-09-27 10:00:00","2016-09-27 10:00:00","2016-09-27 11:00:00","2016-09-27 11:00:00","2016-09-27 12:00:00","2016-09-27 12:00:00","2016-09-27 13:00:00","2016-09-27 13:00:00","2016-09-27 14:00:00","2016-09-27 14:00:00","2016-09-27 15:00:00","2016-09-27 15:00:00","2016-09-27 16:00:00","2016-09-27 16:00:00","2016-09-27 17:00:00","2016-09-27 17:00:00","2016-09-27 18:00:00","2016-09-27 18:00:00"),
AnimalID= c(8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9),
Times_seen=c(6,3,0,7,0,2,0,0,7,0,2,0,5,0,2,1,0,8))
> df1
DateTime AnimalID Times_seen
1 2016-09-27 10:00:00 8 6
2 2016-09-27 10:00:00 9 3
3 2016-09-27 11:00:00 8 0
4 2016-09-27 11:00:00 9 7
5 2016-09-27 12:00:00 8 0
6 2016-09-27 12:00:00 9 2
7 2016-09-27 13:00:00 8 0
8 2016-09-27 13:00:00 9 0
9 2016-09-27 14:00:00 8 7
10 2016-09-27 14:00:00 9 0
11 2016-09-27 15:00:00 8 2
12 2016-09-27 15:00:00 9 0
13 2016-09-27 16:00:00 8 5
14 2016-09-27 16:00:00 9 0
15 2016-09-27 17:00:00 8 2
16 2016-09-27 17:00:00 9 1
17 2016-09-27 18:00:00 8 0
18 2016-09-27 18:00:00 9 8
据此,我想在 df1
中添加一个新变量,说明该动物是否可能存在(如果你没有看到它并不意味着它不存在)那里)。显然,如果Times_seen
大于0,我们在变量df1$Presence
中添加Yes
。但是,当 Times_seen
为 0 时,我想考虑两种选择:A) 动物在那里但没有人看到它(然后,Presence
为 Yes
),以及 B) The动物不在这个地方(然后,Presence
是 No
)。
认为该动物已不在该地点的标准是:该动物的 Times_seen
变量为 0,并且在前两个小时内未在此地点出现。
我希望得到的例子是:
> df1
DateTime AnimalID Times_seen Presence
1 2016-09-27 10:00:00 8 6 Yes
2 2016-09-27 10:00:00 9 3 Yes
3 2016-09-27 11:00:00 8 0 Yes
4 2016-09-27 11:00:00 9 7 Yes
5 2016-09-27 12:00:00 8 0 Yes
6 2016-09-27 12:00:00 9 2 Yes
7 2016-09-27 13:00:00 8 0 No
8 2016-09-27 13:00:00 9 0 Yes
9 2016-09-27 14:00:00 8 7 Yes
10 2016-09-27 14:00:00 9 0 Yes
11 2016-09-27 15:00:00 8 2 Yes
12 2016-09-27 15:00:00 9 0 No
13 2016-09-27 16:00:00 8 5 Yes
14 2016-09-27 16:00:00 9 0 No
15 2016-09-27 17:00:00 8 2 Yes
16 2016-09-27 17:00:00 9 1 Yes
17 2016-09-27 18:00:00 8 0 Yes
18 2016-09-27 18:00:00 9 8 Yes
有人知道怎么做吗?
正如 akrun 在他的评论之一中指出的那样,这是我发现有用的代码:
df1<- df1 %>% mutate(DateTime = ymd_hms(DateTime)) %>%
group_by(AnimalID) %>%
mutate(Presence = map_lgl(DateTime, ~ any(Times_seen[dplyr::between(DateTime, .x - hours(2), .x + hours(0))] > 0)))
> df1
# A tibble: 18 x 4
# Groups: AnimalID [2]
DateTime AnimalID Times_seen Presence
<dttm> <dbl> <dbl> <lgl>
1 2016-09-27 10:00:00 8 6 TRUE
2 2016-09-27 10:00:00 9 3 TRUE
3 2016-09-27 11:00:00 8 0 TRUE
4 2016-09-27 11:00:00 9 7 TRUE
5 2016-09-27 12:00:00 8 0 TRUE
6 2016-09-27 12:00:00 9 2 TRUE
7 2016-09-27 13:00:00 8 0 FALSE
8 2016-09-27 13:00:00 9 0 TRUE
9 2016-09-27 14:00:00 8 7 TRUE
10 2016-09-27 14:00:00 9 0 TRUE
11 2016-09-27 15:00:00 8 2 TRUE
12 2016-09-27 15:00:00 9 0 FALSE
13 2016-09-27 16:00:00 8 5 TRUE
14 2016-09-27 16:00:00 9 0 FALSE
15 2016-09-27 17:00:00 8 2 TRUE
16 2016-09-27 17:00:00 9 1 TRUE
17 2016-09-27 18:00:00 8 0 TRUE
18 2016-09-27 18:00:00 9 8 TRUE
注意:该代码允许您指明在 df1$Presence
中说 No
之前和之后要考虑的小时数。
我有一个数据框 df1
,它以一小时为间隔总结了某个地方出现动物的次数。
举个例子:
df1<- data.frame(DateTime=c("2016-09-27 10:00:00","2016-09-27 10:00:00","2016-09-27 11:00:00","2016-09-27 11:00:00","2016-09-27 12:00:00","2016-09-27 12:00:00","2016-09-27 13:00:00","2016-09-27 13:00:00","2016-09-27 14:00:00","2016-09-27 14:00:00","2016-09-27 15:00:00","2016-09-27 15:00:00","2016-09-27 16:00:00","2016-09-27 16:00:00","2016-09-27 17:00:00","2016-09-27 17:00:00","2016-09-27 18:00:00","2016-09-27 18:00:00"),
AnimalID= c(8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9,8,9),
Times_seen=c(6,3,0,7,0,2,0,0,7,0,2,0,5,0,2,1,0,8))
> df1
DateTime AnimalID Times_seen
1 2016-09-27 10:00:00 8 6
2 2016-09-27 10:00:00 9 3
3 2016-09-27 11:00:00 8 0
4 2016-09-27 11:00:00 9 7
5 2016-09-27 12:00:00 8 0
6 2016-09-27 12:00:00 9 2
7 2016-09-27 13:00:00 8 0
8 2016-09-27 13:00:00 9 0
9 2016-09-27 14:00:00 8 7
10 2016-09-27 14:00:00 9 0
11 2016-09-27 15:00:00 8 2
12 2016-09-27 15:00:00 9 0
13 2016-09-27 16:00:00 8 5
14 2016-09-27 16:00:00 9 0
15 2016-09-27 17:00:00 8 2
16 2016-09-27 17:00:00 9 1
17 2016-09-27 18:00:00 8 0
18 2016-09-27 18:00:00 9 8
据此,我想在 df1
中添加一个新变量,说明该动物是否可能存在(如果你没有看到它并不意味着它不存在)那里)。显然,如果Times_seen
大于0,我们在变量df1$Presence
中添加Yes
。但是,当 Times_seen
为 0 时,我想考虑两种选择:A) 动物在那里但没有人看到它(然后,Presence
为 Yes
),以及 B) The动物不在这个地方(然后,Presence
是 No
)。
认为该动物已不在该地点的标准是:该动物的 Times_seen
变量为 0,并且在前两个小时内未在此地点出现。
我希望得到的例子是:
> df1
DateTime AnimalID Times_seen Presence
1 2016-09-27 10:00:00 8 6 Yes
2 2016-09-27 10:00:00 9 3 Yes
3 2016-09-27 11:00:00 8 0 Yes
4 2016-09-27 11:00:00 9 7 Yes
5 2016-09-27 12:00:00 8 0 Yes
6 2016-09-27 12:00:00 9 2 Yes
7 2016-09-27 13:00:00 8 0 No
8 2016-09-27 13:00:00 9 0 Yes
9 2016-09-27 14:00:00 8 7 Yes
10 2016-09-27 14:00:00 9 0 Yes
11 2016-09-27 15:00:00 8 2 Yes
12 2016-09-27 15:00:00 9 0 No
13 2016-09-27 16:00:00 8 5 Yes
14 2016-09-27 16:00:00 9 0 No
15 2016-09-27 17:00:00 8 2 Yes
16 2016-09-27 17:00:00 9 1 Yes
17 2016-09-27 18:00:00 8 0 Yes
18 2016-09-27 18:00:00 9 8 Yes
有人知道怎么做吗?
正如 akrun 在他的评论之一中指出的那样,这是我发现有用的代码:
df1<- df1 %>% mutate(DateTime = ymd_hms(DateTime)) %>%
group_by(AnimalID) %>%
mutate(Presence = map_lgl(DateTime, ~ any(Times_seen[dplyr::between(DateTime, .x - hours(2), .x + hours(0))] > 0)))
> df1
# A tibble: 18 x 4
# Groups: AnimalID [2]
DateTime AnimalID Times_seen Presence
<dttm> <dbl> <dbl> <lgl>
1 2016-09-27 10:00:00 8 6 TRUE
2 2016-09-27 10:00:00 9 3 TRUE
3 2016-09-27 11:00:00 8 0 TRUE
4 2016-09-27 11:00:00 9 7 TRUE
5 2016-09-27 12:00:00 8 0 TRUE
6 2016-09-27 12:00:00 9 2 TRUE
7 2016-09-27 13:00:00 8 0 FALSE
8 2016-09-27 13:00:00 9 0 TRUE
9 2016-09-27 14:00:00 8 7 TRUE
10 2016-09-27 14:00:00 9 0 TRUE
11 2016-09-27 15:00:00 8 2 TRUE
12 2016-09-27 15:00:00 9 0 FALSE
13 2016-09-27 16:00:00 8 5 TRUE
14 2016-09-27 16:00:00 9 0 FALSE
15 2016-09-27 17:00:00 8 2 TRUE
16 2016-09-27 17:00:00 9 1 TRUE
17 2016-09-27 18:00:00 8 0 TRUE
18 2016-09-27 18:00:00 9 8 TRUE
注意:该代码允许您指明在 df1$Presence
中说 No
之前和之后要考虑的小时数。