使用 lubridate 计算经过时间的集合累积

Counting set accumulations of elapsed time with lubridate

我有一个包含日期时间列的数据集。我需要为每个唯一 ID 计算 4 小时的不同次数。这是我目前所拥有的...

library(data.table)
library(lubridate)

# Fake data
myID <- c(1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2)
timeStamp1 <- c("2017-08-01 00:01:00", "2017-08-01 00:02:00", "2017-08-01 00:03:00", "2017-08-01 00:04:00", 
                "2017-08-01 03:00:00", "2017-08-01 05:00:00", "2017-08-01 05:01:00", "2017-08-01 05:02:00",
               "2017-08-01 01:00:00", "2017-08-01 04:00:00", "2017-08-01 04:59:00", "2017-08-01 05:00:01", 
               "2017-08-01 08:00:00", "2017-08-01 09:01:00", "2017-08-01 13:01:00", "2017-08-01 13:02:00")
df1 <- data.frame(myID, timeStamp1)
dt1 <- setDT(df1)

# Convert to date type
dt1 <- dt1[, BTS := ymd_hms(timeStamp1)]

# Order by MMSI and then TimeStamp
dt1 <- dt1[order(myID, BTS)]

# Create lagged time
dt1 <- dt1[, l_BTS := shift(BTS), by = myID]

# Create span variable
dt1 <- dt1[, spans1 := abs(l_BTS - BTS)]

我认为这涉及 difftime() and/or as.duration() and/or cumsum() 的某种组合,但我一直在给自己挖更深的洞。所需的输出如下所示:

我认为这会产生我想要的结果,但我肯定在这里做错了:

# Count distinct transits by 4 hour blocks
dt1 <- dt1[, tFlag := c(FALSE, diff(as.Date(BTS))) > .1666667, by = myID]
dt1 <- dt1[, t_Count := cumsum(tFlag), by = myID]

我不确定我是否理解你的意思,但如果你需要每组 myID 中最早和最晚时间戳之间的差异,你可以这样做:

library(tidyverse)

dt1 %>% group_by(myID) %>% 
        summarise(min = min(BTS), 
                  max = max(BTS)) %>% 
        mutate(delta = difftime(max, min, units = "hours")/4,
               transits = as.numeric(floor(difftime(max, min, units = "hours")/4)))

# A tibble: 2 x 5
  myID  min                  max                  delta            transits
 <dbl>  <dttm>               <dttm>               <time>           <dbl>              
     1  2017-08-01 00:01:00  2017-08-01 05:02:00  1.25416666666667 1                   
     2  2017-08-01 01:00:00  2017-08-01 13:02:00  3.00833333333333 3