使用 lubridate 计算日期间隔内的日历天数
Count calendar days within a date interval using lubridate
我有入院和出院天数的数据集,我想从中生成三年期间每个日历日的占用床数。我正在使用 tidyverse 和 lubridate 包。
到目前为止,我的方法是将 admit/discharge 列转换为间隔(数据很敏感,所以我不能分享实际日期):
d <- d %>%
mutate(duration = admit %--% discharge)
然后创建一个小标题,其中每一行对应于时间范围,加上一列可以在 for 循环中添加的零:
t <-
tibble(
days = as.Date(date("2017-01-01"):date("2019-12-31")),
count = 0
)
不幸的是,我不知道如何创建一个 for 循环来对每个间隔内的天数求和。到目前为止,这是我的尝试,它始终为我提供 24 的统一值:
for(i in timeline$days) {
if (i %within% d$duration)
timeline$count = timeline$count + 1
}
示例数据。
library(dplyr)
set.seed(42)
d <- tibble(admit = Sys.Date() - sample(300, size = 1000, replace = TRUE)) %>%
mutate(discharge = admit + sample(0:30, size = 1000, replace = TRUE))
d
# # A tibble: 1,000 x 2
# admit discharge
# <date> <date>
# 1 2019-06-18 2019-07-14
# 2 2019-06-11 2019-06-12
# 3 2019-12-24 2020-01-18
# 4 2019-07-13 2019-07-29
# 5 2019-09-08 2019-09-23
# 6 2019-10-15 2019-10-15
# 7 2019-08-11 2019-08-28
# 8 2020-02-07 2020-02-29
# 9 2019-09-03 2019-09-10
# 10 2019-08-20 2019-09-14
# # ... with 990 more rows
我们可以用Map
(或purrr::pmap
)产生日期ranges/sequences的列表:
Map(seq.Date, d$admit, d$discharge, list(by = "days"))[1:2]
# [[1]]
# [1] "2019-06-18" "2019-06-19" "2019-06-20" "2019-06-21" "2019-06-22" "2019-06-23" "2019-06-24"
# [8] "2019-06-25" "2019-06-26" "2019-06-27" "2019-06-28" "2019-06-29" "2019-06-30" "2019-07-01"
# [15] "2019-07-02" "2019-07-03" "2019-07-04" "2019-07-05" "2019-07-06" "2019-07-07" "2019-07-08"
# [22] "2019-07-09" "2019-07-10" "2019-07-11" "2019-07-12" "2019-07-13" "2019-07-14"
# [[2]]
# [1] "2019-06-11" "2019-06-12"
然后组合这些,将它们制表(用table
),然后enframe
它们:
Map(seq.Date, d$admit, d$discharge, list(by = "days")) %>%
do.call(c, .) %>%
table() %>%
tibble::enframe(name = "date", value = "count") %>%
# because `table` preserves a *character* representation of the Date
mutate(date = as.Date(date)) %>%
arrange(date)
# # A tibble: 328 x 2
# date count
# <date> <table>
# 1 2019-05-24 1
# 2 2019-05-25 3
# 3 2019-05-26 7
# 4 2019-05-27 8
# 5 2019-05-28 9
# 6 2019-05-29 14
# 7 2019-05-30 20
# 8 2019-05-31 20
# 9 2019-06-01 20
# 10 2019-06-02 21
# # ... with 318 more rows
这是使用 tidyverse
函数的另一种方法。
library(tidyverse)
d %>%
mutate(days = map2(admit, discharge, seq, by = "day")) %>%
unnest(days) %>%
count(days) %>%
right_join(t, by = "days") %>%
mutate(n = coalesce(n, as.integer(count))) %>%
select(-count)
我们在 admit
和 discharge
之间创建日期序列,每个唯一日期 count
,将其与 t
连接,以便 [=15 中的所有日期=] 保持原样。
我有入院和出院天数的数据集,我想从中生成三年期间每个日历日的占用床数。我正在使用 tidyverse 和 lubridate 包。
到目前为止,我的方法是将 admit/discharge 列转换为间隔(数据很敏感,所以我不能分享实际日期):
d <- d %>%
mutate(duration = admit %--% discharge)
然后创建一个小标题,其中每一行对应于时间范围,加上一列可以在 for 循环中添加的零:
t <-
tibble(
days = as.Date(date("2017-01-01"):date("2019-12-31")),
count = 0
)
不幸的是,我不知道如何创建一个 for 循环来对每个间隔内的天数求和。到目前为止,这是我的尝试,它始终为我提供 24 的统一值:
for(i in timeline$days) {
if (i %within% d$duration)
timeline$count = timeline$count + 1
}
示例数据。
library(dplyr)
set.seed(42)
d <- tibble(admit = Sys.Date() - sample(300, size = 1000, replace = TRUE)) %>%
mutate(discharge = admit + sample(0:30, size = 1000, replace = TRUE))
d
# # A tibble: 1,000 x 2
# admit discharge
# <date> <date>
# 1 2019-06-18 2019-07-14
# 2 2019-06-11 2019-06-12
# 3 2019-12-24 2020-01-18
# 4 2019-07-13 2019-07-29
# 5 2019-09-08 2019-09-23
# 6 2019-10-15 2019-10-15
# 7 2019-08-11 2019-08-28
# 8 2020-02-07 2020-02-29
# 9 2019-09-03 2019-09-10
# 10 2019-08-20 2019-09-14
# # ... with 990 more rows
我们可以用Map
(或purrr::pmap
)产生日期ranges/sequences的列表:
Map(seq.Date, d$admit, d$discharge, list(by = "days"))[1:2]
# [[1]]
# [1] "2019-06-18" "2019-06-19" "2019-06-20" "2019-06-21" "2019-06-22" "2019-06-23" "2019-06-24"
# [8] "2019-06-25" "2019-06-26" "2019-06-27" "2019-06-28" "2019-06-29" "2019-06-30" "2019-07-01"
# [15] "2019-07-02" "2019-07-03" "2019-07-04" "2019-07-05" "2019-07-06" "2019-07-07" "2019-07-08"
# [22] "2019-07-09" "2019-07-10" "2019-07-11" "2019-07-12" "2019-07-13" "2019-07-14"
# [[2]]
# [1] "2019-06-11" "2019-06-12"
然后组合这些,将它们制表(用table
),然后enframe
它们:
Map(seq.Date, d$admit, d$discharge, list(by = "days")) %>%
do.call(c, .) %>%
table() %>%
tibble::enframe(name = "date", value = "count") %>%
# because `table` preserves a *character* representation of the Date
mutate(date = as.Date(date)) %>%
arrange(date)
# # A tibble: 328 x 2
# date count
# <date> <table>
# 1 2019-05-24 1
# 2 2019-05-25 3
# 3 2019-05-26 7
# 4 2019-05-27 8
# 5 2019-05-28 9
# 6 2019-05-29 14
# 7 2019-05-30 20
# 8 2019-05-31 20
# 9 2019-06-01 20
# 10 2019-06-02 21
# # ... with 318 more rows
这是使用 tidyverse
函数的另一种方法。
library(tidyverse)
d %>%
mutate(days = map2(admit, discharge, seq, by = "day")) %>%
unnest(days) %>%
count(days) %>%
right_join(t, by = "days") %>%
mutate(n = coalesce(n, as.integer(count))) %>%
select(-count)
我们在 admit
和 discharge
之间创建日期序列,每个唯一日期 count
,将其与 t
连接,以便 [=15 中的所有日期=] 保持原样。