计算最大日期间隔 - R

Calculate maximum date interval - R

挑战是 data.frame,其中包含一个组变量 (id) 和两个日期变量(startstop)。日期间隔是不规则的,我正在尝试计算从每组第一个 start 日期开始的不间断间隔天数。

示例数据:

data <- data.frame(
  id = c(1, 2, 2, 3, 3, 3, 3, 3, 4, 5),
  start = as.Date(c("2016-02-18", "2016-12-07", "2016-12-12", "2015-04-10", 
                    "2015-04-12", "2015-04-14", "2015-05-15", "2015-07-14", 
                    "2010-12-08", "2011-03-09")),
  stop = as.Date(c("2016-02-19", "2016-12-12", "2016-12-13", "2015-04-13", 
                   "2015-04-22", "2015-05-13", "2015-07-13", "2015-07-15", 
                   "2010-12-10", "2011-03-11"))
)

> data
   id      start       stop
1   1 2016-02-18 2016-02-19
2   2 2016-12-07 2016-12-12
3   2 2016-12-12 2016-12-13
4   3 2015-04-10 2015-04-13
5   3 2015-04-12 2015-04-22
6   3 2015-04-14 2015-05-13
7   3 2015-05-15 2015-07-13
8   3 2015-07-14 2015-07-15
9   4 2010-12-08 2010-12-10
10  5 2011-03-09 2011-03-11

目标是 data.frame 这样的:

   id      start       stop duration_from_start
1   1 2016-02-18 2016-02-19                   2
2   2 2016-12-07 2016-12-12                   7
3   2 2016-12-12 2016-12-13                   7
4   3 2015-04-10 2015-04-13                  34
5   3 2015-04-12 2015-04-22                  34
6   3 2015-04-14 2015-05-13                  34
7   3 2015-05-15 2015-07-13                  34
8   3 2015-07-14 2015-07-15                  34
9   4 2010-12-08 2010-12-10                   3
10  5 2011-03-09 2011-03-11                   3

或者这个:

  id      start       stop duration_from_start
1  1 2016-02-18 2016-02-19                   2
2  2 2016-12-07 2016-12-13                   7
3  3 2015-04-10 2015-05-13                  34
4  4 2010-12-08 2010-12-10                   3
5  5 2011-03-09 2011-03-11                   3

确定从行 67 的间隔并将此点作为最大间隔(34 天)很重要。 2018-10-012018-10-01 的间隔将被计算为 1.

我常用的 lubridate 方法不适用于此示例 (interval %within lag(interval))。

有什么想法吗?

library(magrittr)
library(data.table)
setDT(data)

first_int <- function(start, stop){
  ind <- rleid((start - shift(stop, fill = Inf)) > 0) == 1
  list(start = min(start[ind]),
       stop  = max(stop[ind]))
}

newdata <- 
  data[, first_int(start, stop), by = id] %>% 
     .[, duration := stop - start + 1]


#    id      start       stop duration
# 1:  1 2016-02-18 2016-02-19   2 days
# 2:  2 2016-12-07 2016-12-13   7 days
# 3:  3 2015-04-10 2015-05-13  34 days
# 4:  4 2010-12-08 2010-12-10   3 days
# 5:  5 2011-03-09 2011-03-11   3 days