计算 data.table 个日期之间的条目
Counting data.table entries between dates
我有一个数据 table,其中包含一系列具有开始和结束日期的条目,如下所示:
id
start
end
1
1958-01-03
1962-10-11
2
1961-02-23
2012-04-28
等等
我想按月统计这些项目中有多少在运行。所以我试着这样做:
data.table(
month = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
month_end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1
) %>%
.[,count := satcat[start >= month & month_end <= end,.N]] %>%
.[]
然而,我得到的是一个错误:
Warning message in `>.default`(start, month):
“longer object length is not a multiple of shorter object length”
Warning message in `<=.default`(month_end, end):
“longer object length is not a multiple of shorter object length”
和 count
是所有行的相同数字。为什么会发生这种情况,正确的做法是什么?我觉得应该有一些 apply
解决方案,但我无法解决。
这是 foverlaps
的工作:
library(data.table)
DT <- data.table(id = 1:2,
start = as.Date(c("1958-01-03", "1961-02-23")),
end = as.Date(c("1961-10-11", "2012-04-28")))
periods <- data.table(start = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1)
setkey(DT, start, end)
setkey(periods, start, end)
res <- foverlaps(periods, DT, nomatch = NA)[, .(N = sum(!is.na(id))), by = .(i.start, i.end)]
plot(N ~ i.start, data = res, type = "s")
res[N == 2]
# i.start i.end N
#1: 1961-02-01 1961-02-28 2
#2: 1961-03-01 1961-03-31 2
#3: 1961-04-01 1961-04-30 2
#4: 1961-05-01 1961-05-31 2
#5: 1961-06-01 1961-06-30 2
#6: 1961-07-01 1961-07-31 2
#7: 1961-08-01 1961-08-31 2
#8: 1961-09-01 1961-09-30 2
#9: 1961-10-01 1961-10-31 2
我有一个数据 table,其中包含一系列具有开始和结束日期的条目,如下所示:
id | start | end |
---|---|---|
1 | 1958-01-03 | 1962-10-11 |
2 | 1961-02-23 | 2012-04-28 |
等等
我想按月统计这些项目中有多少在运行。所以我试着这样做:
data.table(
month = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
month_end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1
) %>%
.[,count := satcat[start >= month & month_end <= end,.N]] %>%
.[]
然而,我得到的是一个错误:
Warning message in `>.default`(start, month):
“longer object length is not a multiple of shorter object length”
Warning message in `<=.default`(month_end, end):
“longer object length is not a multiple of shorter object length”
和 count
是所有行的相同数字。为什么会发生这种情况,正确的做法是什么?我觉得应该有一些 apply
解决方案,但我无法解决。
这是 foverlaps
的工作:
library(data.table)
DT <- data.table(id = 1:2,
start = as.Date(c("1958-01-03", "1961-02-23")),
end = as.Date(c("1961-10-11", "2012-04-28")))
periods <- data.table(start = seq(as.Date('1950-01-01','%Y-%m-%d'), as.Date('2021-09-01','%Y-%m-%d'), 'months'),
end = seq(as.Date('1950-02-01','%Y-%m-%d'), as.Date('2021-10-01', '%Y-%m-%d'), 'months') -1)
setkey(DT, start, end)
setkey(periods, start, end)
res <- foverlaps(periods, DT, nomatch = NA)[, .(N = sum(!is.na(id))), by = .(i.start, i.end)]
plot(N ~ i.start, data = res, type = "s")
res[N == 2]
# i.start i.end N
#1: 1961-02-01 1961-02-28 2
#2: 1961-03-01 1961-03-31 2
#3: 1961-04-01 1961-04-30 2
#4: 1961-05-01 1961-05-31 2
#5: 1961-06-01 1961-06-30 2
#6: 1961-07-01 1961-07-31 2
#7: 1961-08-01 1961-08-31 2
#8: 1961-09-01 1961-09-30 2
#9: 1961-10-01 1961-10-31 2