如何使用 lubridate 从 R 中大型数据集中的单列中提取开始和结束日期？

Question

我想完成 something very similar to what is being done in this question. I have a large data.table (or data.frame) with a single column that has a Base Time Stamp (BST). I'd need to determine number of days for each of the unique IDs which may be many tens of thousands of rows. All the lubridate tutorials I'm finding start with the very simple start to end example... (this is a great intro but not the answer I'm looking for)。

我基本上需要遍历我的 BST 列并确定每个 ID 的开始和结束日期。

示例数据如下：

library(data.table)

myID <- c(1,1,1,1,1,1,2,2,2,2,2,2)
BST <- c("2017-06-01 00:00:01", "2017-06-01 00:00:02",
         "2017-06-02 00:00:01", "2017-06-02 00:00:02", 
         "2017-06-03 00:00:01", "2017-06-03 00:00:02",
         "2017-06-01 00:00:01", "2017-06-01 00:00:02", 
         "2017-06-03 00:00:01", "2017-06-03 00:00:02", 
         "2017-06-05 00:00:01", "2017-06-05 00:00:02")
V3 <- c("a", "a", "a", "a", "a", "a", "b", "b", "b","b", "b", "b")
dt1 <- data.table(myID, BST, V3)

期望的结果：

然后它是如何在保留所有原始行的同时完成的……a la dplyr::mutate() ?

第二个想要的结果：

Answer 1

您可以尝试使用 lubridate::ymd_hms 将 BST 转换为 date/time，然后对 myID 进行分组，将 BST 的最小值设为 startDates BST 的最大值为 endDates。

library(data.table)
library(lubridate)
dt1[,.(startDates= min(ymd_hms(BST)), endDates = max(ymd_hms(BST))), by=myID]
#   myID          startDates            endDates
#1:    1 2017-06-01 00:00:01 2017-06-03 00:00:02
#2:    2 2017-06-01 00:00:01 2017-06-05 00:00:02

如何使用 lubridate 从 R 中大型数据集中的单列中提取开始和结束日期？

How to extract start and end dates from single column in large dataset in R using lubridate?

time

r

lubridate