dplyr & lubridate - 按小时和分钟过滤(即 10:15)
dplyr & lubridate - filter by hour and minute (i.e. 10:15)
使用 dplyr
和 lubridate
:
我看过很多关于如何过滤几个小时的帖子,即 filter (hour(Timestamp)>7)
,但我希望在上午 9 点之间过滤 每天 - 8:15下午(不管是哪一天,虽然这里只是1/1/2015)。
取此示例数据 (df
),还有:https://drive.google.com/file/d/10x_VY2stTRuVwz7QFWXMbaRqNqMRBeIk/view?usp=sharing):
ID timestamp value
a 1/1/2015 8:45 1
a 1/1/2015 9:00 2
a 1/1/2015 9:15 3
a 1/1/2015 9:30 4
a 1/1/2015 9:45 5
a 1/1/2015 10:00 6
a 1/1/2015 10:15 7
a 1/1/2015 19:45 11
a 1/1/2015 20:00 12
a 1/1/2015 20:15 13
a 1/1/2015 20:30 14
a 1/1/2015 20:45 14
b 1/1/2015 8:45 1
b 1/1/2015 9:00 2
b 1/1/2015 9:15 4
b 1/1/2015 9:30 5
b 1/1/2015 9:45 5
b 1/1/2015 10:00 5
b 1/1/2015 10:15 5
b 1/1/2015 19:45 5
b 1/1/2015 20:00 5
b 1/1/2015 20:15 5
b 1/1/2015 20:30 2
b 1/1/2015 20:45 2
使用以下代码:
df %>% group_by(ID) %>%
+ filter(hour(timestamp)>=9 & hour(timestamp)<21 & minute(timestamp)<16) %>%
+ summarise(mean = mean(value)) %>% as.data.frame()
产量:
ID mean
1 a 7.166667
2 b 4.333333
其中不包含任何分钟 (Timestamp) 大于 0:15 的时间戳(错误地遗漏了 9:45am)
正确的方法应该是:
ID mean
1 a 7
2 b 4.555
有什么想法吗?
> head(dput(df))
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b"), timestamp = structure(c(1420119900, 1420120800,
1420121700, 1420122600, 1420123500, 1420124400, 1420125300, 1420159500,
1420160400, 1420161300, 1420162200, 1420163100, 1420119900, 1420120800,
1420121700, 1420122600, 1420123500, 1420124400, 1420125300, 1420159500,
1420160400, 1420161300, 1420162200, 1420163100), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
11L, 12L, 13L, 14L, 14L, 1L, 2L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 2L, 2L)), .Names = c("ID", "timestamp", "value"), class = "data.frame", row.names = c(NA,
-24L))
ID timestamp value
1 a 2015-01-01 08:45:00 1
2 a 2015-01-01 09:00:00 2
3 a 2015-01-01 09:15:00 3
4 a 2015-01-01 09:30:00 4
5 a 2015-01-01 09:45:00 5
6 a 2015-01-01 10:00:00 6
我认为您不需要小时转换。我在这里计算上午 11 点到晚上 10 点之间的时间:
df %>%
group_by(ID) %>%
filter(timestamp > '2015-01-01 11:00:00' & timestamp < '2015-01-01 22:00:00') %>%
summarise(mean = mean(value)) %>% as.data.frame()
另一种选择是计算自午夜以来的分钟数。要介于 09:00 和 20:15 之间,自午夜以来的分钟数需要介于 9*60 和 20*16 + 15
之间
library(dplyr)
library(lubridate)
df %>%
mutate(since_midnight = hour(timestamp) * 60 + minute(timestamp)) %>%
filter(since_midnight >= 9*60 & since_midnight < (20 * 60 + 15)) %>%
summarise(mean = mean(value))
使用 dplyr
和 lubridate
:
我看过很多关于如何过滤几个小时的帖子,即 filter (hour(Timestamp)>7)
,但我希望在上午 9 点之间过滤 每天 - 8:15下午(不管是哪一天,虽然这里只是1/1/2015)。
取此示例数据 (df
),还有:https://drive.google.com/file/d/10x_VY2stTRuVwz7QFWXMbaRqNqMRBeIk/view?usp=sharing):
ID timestamp value
a 1/1/2015 8:45 1
a 1/1/2015 9:00 2
a 1/1/2015 9:15 3
a 1/1/2015 9:30 4
a 1/1/2015 9:45 5
a 1/1/2015 10:00 6
a 1/1/2015 10:15 7
a 1/1/2015 19:45 11
a 1/1/2015 20:00 12
a 1/1/2015 20:15 13
a 1/1/2015 20:30 14
a 1/1/2015 20:45 14
b 1/1/2015 8:45 1
b 1/1/2015 9:00 2
b 1/1/2015 9:15 4
b 1/1/2015 9:30 5
b 1/1/2015 9:45 5
b 1/1/2015 10:00 5
b 1/1/2015 10:15 5
b 1/1/2015 19:45 5
b 1/1/2015 20:00 5
b 1/1/2015 20:15 5
b 1/1/2015 20:30 2
b 1/1/2015 20:45 2
使用以下代码:
df %>% group_by(ID) %>%
+ filter(hour(timestamp)>=9 & hour(timestamp)<21 & minute(timestamp)<16) %>%
+ summarise(mean = mean(value)) %>% as.data.frame()
产量:
ID mean
1 a 7.166667
2 b 4.333333
其中不包含任何分钟 (Timestamp) 大于 0:15 的时间戳(错误地遗漏了 9:45am)
正确的方法应该是:
ID mean
1 a 7
2 b 4.555
有什么想法吗?
> head(dput(df))
structure(list(ID = c("a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b"), timestamp = structure(c(1420119900, 1420120800,
1420121700, 1420122600, 1420123500, 1420124400, 1420125300, 1420159500,
1420160400, 1420161300, 1420162200, 1420163100, 1420119900, 1420120800,
1420121700, 1420122600, 1420123500, 1420124400, 1420125300, 1420159500,
1420160400, 1420161300, 1420162200, 1420163100), class = c("POSIXct",
"POSIXt"), tzone = ""), value = c(1L, 2L, 3L, 4L, 5L, 6L, 7L,
11L, 12L, 13L, 14L, 14L, 1L, 2L, 4L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 2L, 2L)), .Names = c("ID", "timestamp", "value"), class = "data.frame", row.names = c(NA,
-24L))
ID timestamp value
1 a 2015-01-01 08:45:00 1
2 a 2015-01-01 09:00:00 2
3 a 2015-01-01 09:15:00 3
4 a 2015-01-01 09:30:00 4
5 a 2015-01-01 09:45:00 5
6 a 2015-01-01 10:00:00 6
我认为您不需要小时转换。我在这里计算上午 11 点到晚上 10 点之间的时间:
df %>%
group_by(ID) %>%
filter(timestamp > '2015-01-01 11:00:00' & timestamp < '2015-01-01 22:00:00') %>%
summarise(mean = mean(value)) %>% as.data.frame()
另一种选择是计算自午夜以来的分钟数。要介于 09:00 和 20:15 之间,自午夜以来的分钟数需要介于 9*60 和 20*16 + 15
之间library(dplyr)
library(lubridate)
df %>%
mutate(since_midnight = hour(timestamp) * 60 + minute(timestamp)) %>%
filter(since_midnight >= 9*60 & since_midnight < (20 * 60 + 15)) %>%
summarise(mean = mean(value))