使用 lubridate 计算日期间隔内的日历天数

Count calendar days within a date interval using lubridate

我有入院和出院天数的数据集,我想从中生成三年期间每个日历日的占用床数。我正在使用 tidyverse 和 lubridate 包。

到目前为止,我的方法是将 admit/discharge 列转换为间隔(数据很敏感,所以我不能分享实际日期):

d <- d %>%
  mutate(duration = admit %--% discharge)

然后创建一个小标题,其中每一行对应于时间范围,加上一列可以在 for 循环中添加的零:

t <- 
  tibble(
    days = as.Date(date("2017-01-01"):date("2019-12-31")), 
    count = 0
  )

不幸的是,我不知道如何创建一个 for 循环来对每个间隔内的天数求和。到目前为止,这是我的尝试,它始终为我提供 24 的统一值:

for(i in timeline$days) {
  if (i %within% d$duration)
    timeline$count = timeline$count + 1
}

示例数据。

library(dplyr)
set.seed(42)
d <- tibble(admit = Sys.Date() - sample(300, size = 1000, replace = TRUE)) %>%
  mutate(discharge = admit + sample(0:30, size = 1000, replace = TRUE))
d
# # A tibble: 1,000 x 2
#    admit      discharge 
#    <date>     <date>    
#  1 2019-06-18 2019-07-14
#  2 2019-06-11 2019-06-12
#  3 2019-12-24 2020-01-18
#  4 2019-07-13 2019-07-29
#  5 2019-09-08 2019-09-23
#  6 2019-10-15 2019-10-15
#  7 2019-08-11 2019-08-28
#  8 2020-02-07 2020-02-29
#  9 2019-09-03 2019-09-10
# 10 2019-08-20 2019-09-14
# # ... with 990 more rows

我们可以用Map(或purrr::pmap)产生日期ranges/sequences的列表:

Map(seq.Date, d$admit, d$discharge, list(by = "days"))[1:2]
# [[1]]
#  [1] "2019-06-18" "2019-06-19" "2019-06-20" "2019-06-21" "2019-06-22" "2019-06-23" "2019-06-24"
#  [8] "2019-06-25" "2019-06-26" "2019-06-27" "2019-06-28" "2019-06-29" "2019-06-30" "2019-07-01"
# [15] "2019-07-02" "2019-07-03" "2019-07-04" "2019-07-05" "2019-07-06" "2019-07-07" "2019-07-08"
# [22] "2019-07-09" "2019-07-10" "2019-07-11" "2019-07-12" "2019-07-13" "2019-07-14"
# [[2]]
# [1] "2019-06-11" "2019-06-12"

然后组合这些,将它们制表(用table),然后enframe它们:

Map(seq.Date, d$admit, d$discharge, list(by = "days")) %>%
  do.call(c, .) %>%
  table() %>%
  tibble::enframe(name = "date", value = "count") %>%
  # because `table` preserves a *character* representation of the Date
  mutate(date = as.Date(date)) %>%
  arrange(date)
# # A tibble: 328 x 2
#    date       count  
#    <date>     <table>
#  1 2019-05-24  1     
#  2 2019-05-25  3     
#  3 2019-05-26  7     
#  4 2019-05-27  8     
#  5 2019-05-28  9     
#  6 2019-05-29 14     
#  7 2019-05-30 20     
#  8 2019-05-31 20     
#  9 2019-06-01 20     
# 10 2019-06-02 21     
# # ... with 318 more rows

这是使用 tidyverse 函数的另一种方法。

library(tidyverse)

d %>%
  mutate(days = map2(admit, discharge, seq, by = "day")) %>%
  unnest(days) %>%
  count(days) %>%
  right_join(t, by = "days") %>%
  mutate(n = coalesce(n, as.integer(count))) %>%
  select(-count)

我们在 admitdischarge 之间创建日期序列,每个唯一日期 count,将其与 t 连接,以便 [=15 中的所有日期=] 保持原样。