计算 data.table 个日期之间的条目

Counting data.table entries between dates

我有一个数据 table,其中包含一系列具有开始和结束日期的条目,如下所示:

id start end
1 1958-01-03 1962-10-11
2 1961-02-23 2012-04-28

等等

我想按月统计这些项目中有多少在运行。所以我试着这样做:

data.table(
    month = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
    month_end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1
) %>%
    .[,count := satcat[start >= month & month_end <= end,.N]] %>%
    .[]

然而,我得到的是一个错误:

Warning message in `>.default`(start, month):
“longer object length is not a multiple of shorter object length”
Warning message in `<=.default`(month_end, end):
“longer object length is not a multiple of shorter object length”

count 是所有行的相同数字。为什么会发生这种情况,正确的做法是什么?我觉得应该有一些 apply 解决方案,但我无法解决。

这是 foverlaps 的工作:

library(data.table)
DT <- data.table(id = 1:2, 
                 start = as.Date(c("1958-01-03", "1961-02-23")),
                 end = as.Date(c("1961-10-11", "2012-04-28")))

periods <- data.table(start = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
                      end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1)


setkey(DT, start, end)
setkey(periods, start, end)

res <- foverlaps(periods, DT, nomatch = NA)[, .(N = sum(!is.na(id))), by = .(i.start, i.end)]

plot(N ~ i.start, data = res, type = "s")

res[N == 2]
#      i.start      i.end N
#1: 1961-02-01 1961-02-28 2
#2: 1961-03-01 1961-03-31 2
#3: 1961-04-01 1961-04-30 2
#4: 1961-05-01 1961-05-31 2
#5: 1961-06-01 1961-06-30 2
#6: 1961-07-01 1961-07-31 2
#7: 1961-08-01 1961-08-31 2
#8: 1961-09-01 1961-09-30 2
#9: 1961-10-01 1961-10-31 2