如何在 R 中按天按 UTC 时间戳分组

Question

所以我有这个 UTC 时间戳样本和一堆其他数据。我想按日期对我的数据进行分组。这意味着我不需要 hours/mins/secs 并且想要一个新的 df 来显示组合在一起的操作数。

我尝试使用 lubridate 提取日期，但无法正确获取原点。

数据

hw0 <- read.table(text = 
'ID   timestamp        action
4f.. 20160305195246   visitPage
75.. 20160305195302   visitPage
77.. 20160305195312   checkin
42.. 20160305195322   checkin
8f.. 20160305195332   searchResultPage
29.. 20160305195342   checkin', header = T)

这是我试过的

library(dplyr)
library(lubridate) #this will allow us to extract the date
daily <- hw0 %>%
mutate(date=date(as.POSIXct(timestamp),origin='1970-01-01'))

daily <- daily %>%
group_by(date)

我不确定使用什么作为来源，我的错误提示该值不正确。最终，我希望代码 return 一个新的 df，它具有一个变量（日期），其中包含一个唯一日期列表以及每天有多少不同的操作。

Answer 1

假设末尾的数字是基于 24 小时时间的，您可以使用：

daily = hw0 %>% 
  mutate(date = as.POSIXct(as.character(timestamp), format = '%Y%m%d%H%M%S'))

如果您想去掉小时时间，可以改用 as.Date。当你给出一个数字参数时，你需要提供起点，它被解释为自起点以来的天数。在您的情况下，您应该只给它一个字符向量并提供日期格式。

Answer 2

Lubridate 还具有可以提取日期的 ymd_hms() 函数，以及可以提供帮助的 floor_date() 函数。

library(tidyverse)
daily <- hw0 %>%
  mutate(time = ymd_hms(timestamp, tz = 'UTC'),
         date = floor_date(time, unit = 'day'))

Answer 3

lubridate 也有 parse_date_time 这似乎是上述两种解决方案的完美结合。

library(tidyverse)
library(lubridate)

hw0 %>% 
  mutate(timestamp = parse_date_time(timestamp, order = "%Y%m%d%H%M%S"))


    ID           timestamp           action
1 4f.. 2016-03-05 19:52:46        visitPage
2 75.. 2016-03-05 19:53:02        visitPage
3 77.. 2016-03-05 19:53:12          checkin
4 42.. 2016-03-05 19:53:22          checkin
5 8f.. 2016-03-05 19:53:32 searchResultPage
6 29.. 2016-03-05 19:53:42          checkin

如何在 R 中按天按 UTC 时间戳分组

How to group by timestamp in UTC by day in R

r

utc

dplyr