仅使用 tidyverse 计算平均占用率
Calculating the Average Occupancy with tidyverse only
我仅使用 tidyverse 计算一天中几个小时的平均到达人数和平均入住人数。
然而,上面的例子实际上并没有计算平均入住率,而是计算了特定时间的人数。
然而,如果我有一个人要来,比如说在医院,急诊室,2018 年 12 月 10 日上午 10 点到达,第二天 7:45 离开。这意味着从上午 10 点一直到第二天早上 7 点(不包括上午 8 点和 9 点),入住率的值为 1.00 个患者。对两个日期的占用率取平均值,这意味着从患者到达之日上午 10 点到患者出院后第二天早上 7 点的所有时间,占用率为 0.5,不包括上午 8 点和上午 9 点(平均值为 0) . Arrivals 也是如此,不同之处在于它只计算患者到达的时间,而不是他们停留的所有时间。这就是 Occupancy 和 Arrivals 之间的区别,这似乎是我之前的帮助请求中给出的所有答案都解决了 Arrivals 平均值而不是 Occupancy,尽管我请求的是 Averaged Occupancy。
这是我过去试图解决的一个例子。
我在下面重现。
df <- structure(list(ID = c(101, 102, 103, 104, 105, 106, 107), Adm =
structure(c(1326309720, 1326309900, 1328990700, 1328997240,
1329000840, 1329004440, 1329004680),
class = c("POSIXct", "POSIXt"), tzone = ""), Disc =
structure(c(1326313800, 1326317340, 1326317460, 1326324660,
1326328260, 1 326335460, 1326335460),
class = c("POSIXct", "POSIXt"), tzone = "")),
.Names = c("ID", "Adm", "Disc"),
row.names = c(NA, -7L), class = "data.frame")
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(occupancy = ifelse(last(Disc) > first(Adm) + 60*60, 1, 0))
这是一个极简的例子,为了简单起见,这是我有的可重现的数据类型。然而,出于数据保护的原因,不能透露原始数据中的任何数据。
df <- structure(list(ID = 101:103,
`Admissions <- as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00",
"2018-12-11 08:05:00"),
tz = "Europe/London")` =
structure(c(1544434200, 1544436900, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London"),
`Discharges <- as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00",
"2018-12-11 09:05-00"),
tz = "Europe/London")` =
structure(c(1544445000, 1544514300, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London")), row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
预期的输出是:
output <- structure(list(
Hour = 0:23,
Average_arrivals = c(0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Average_occ = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5, 1,
1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5)),
row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"),
spec = structure(list(cols = list(X1 =
structure(list(), class = c("collector_integer", "collector")),
Hour = structure(list(), class =c("collector_integer","collector")),
Average_arrivals = structure(list(),
class = c("collector_double", "collector")),
Average_occ = structure(list(), class = c("collector_double",
"collector"))),
default = structure(list(),
class = c("collector_guess","collector"))),
class = "col_spec"))
这是一种使用 tidyverse 的方法。首先,我使用 gather
转换为长格式,然后创建一个 "change" 列,该列为入院时为 +1,出院时为 -1。
然后我按小时总结(可以更细化,如果需要的话,比如“5 分钟”)并使用 padr:pad
添加所有未提及的小时数(我还在后面添加额外的小时数)全套48小时)。
那么占用率就是变化的累积总和。通过在 2 天内按小时分组,我们可以得到 Average_arrivals 和 Average_occ。
数据
# Note, I could not load the sample data as provided, as the variable
# names included the desired data as text.
df <- data.frame(ID = 101:103,
Admissions = as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00", "2018-12-11 08:05:00")),
Discharges = as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00", "2018-12-11 09:05:00")))
解决方案
df_flat <- df %>%
gather(status, time, Admissions:Discharges) %>%
mutate(change = if_else(status == "Admissions", 1, -1)) %>%
group_by(time_hr = lubridate::floor_date(time, "1 hour")) %>%
summarize(arrivals = sum(status == "Admissions"),
change = sum(change)) %>%
# Here, adding add'l rows so all hours have 2 instances
padr::pad(end_val = min(.$time_hr) + dhours(47)) %>%
replace_na(list(arrivals = 0, change = 0)) %>%
mutate(occupancy = cumsum(change))
output <- df_flat %>%
group_by(hour(time_hr)) %>%
summarize(Average_arrivals = mean(arrivals),
Average_occ = mean(occupancy))
输出
output
# A tibble: 24 x 3
# hour Average_arrivals Average_occ
# <int> <dbl> <dbl>
# 1 0 0 0.5
# 2 1 0 0.5
# 3 2 0 0.5
# 4 3 0 0.5
# 5 4 0 0.5
# 6 5 0 0.5
# 7 6 0 0.5
# 8 7 0 0
# 9 8 0.5 0.5
# 10 9 0.5 0.5
我仅使用 tidyverse 计算一天中几个小时的平均到达人数和平均入住人数。
然而,上面的例子实际上并没有计算平均入住率,而是计算了特定时间的人数。
然而,如果我有一个人要来,比如说在医院,急诊室,2018 年 12 月 10 日上午 10 点到达,第二天 7:45 离开。这意味着从上午 10 点一直到第二天早上 7 点(不包括上午 8 点和 9 点),入住率的值为 1.00 个患者。对两个日期的占用率取平均值,这意味着从患者到达之日上午 10 点到患者出院后第二天早上 7 点的所有时间,占用率为 0.5,不包括上午 8 点和上午 9 点(平均值为 0) . Arrivals 也是如此,不同之处在于它只计算患者到达的时间,而不是他们停留的所有时间。这就是 Occupancy 和 Arrivals 之间的区别,这似乎是我之前的帮助请求中给出的所有答案都解决了 Arrivals 平均值而不是 Occupancy,尽管我请求的是 Averaged Occupancy。
这是我过去试图解决的一个例子。
我在下面重现。
df <- structure(list(ID = c(101, 102, 103, 104, 105, 106, 107), Adm =
structure(c(1326309720, 1326309900, 1328990700, 1328997240,
1329000840, 1329004440, 1329004680),
class = c("POSIXct", "POSIXt"), tzone = ""), Disc =
structure(c(1326313800, 1326317340, 1326317460, 1326324660,
1326328260, 1 326335460, 1326335460),
class = c("POSIXct", "POSIXt"), tzone = "")),
.Names = c("ID", "Adm", "Disc"),
row.names = c(NA, -7L), class = "data.frame")
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(occupancy = ifelse(last(Disc) > first(Adm) + 60*60, 1, 0))
这是一个极简的例子,为了简单起见,这是我有的可重现的数据类型。然而,出于数据保护的原因,不能透露原始数据中的任何数据。
df <- structure(list(ID = 101:103,
`Admissions <- as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00",
"2018-12-11 08:05:00"),
tz = "Europe/London")` =
structure(c(1544434200, 1544436900, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London"),
`Discharges <- as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00",
"2018-12-11 09:05-00"),
tz = "Europe/London")` =
structure(c(1544445000, 1544514300, 1544519100),
class = c("POSIXct", "POSIXt"),
tzone = "Europe/London")), row.names = c(NA, -3L),
class = c("tbl_df", "tbl", "data.frame"))
预期的输出是:
output <- structure(list(
Hour = 0:23,
Average_arrivals = c(0, 0, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Average_occ = c(0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0, 0.5, 1,
1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.5)),
row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"),
spec = structure(list(cols = list(X1 =
structure(list(), class = c("collector_integer", "collector")),
Hour = structure(list(), class =c("collector_integer","collector")),
Average_arrivals = structure(list(),
class = c("collector_double", "collector")),
Average_occ = structure(list(), class = c("collector_double",
"collector"))),
default = structure(list(),
class = c("collector_guess","collector"))),
class = "col_spec"))
这是一种使用 tidyverse 的方法。首先,我使用 gather
转换为长格式,然后创建一个 "change" 列,该列为入院时为 +1,出院时为 -1。
然后我按小时总结(可以更细化,如果需要的话,比如“5 分钟”)并使用 padr:pad
添加所有未提及的小时数(我还在后面添加额外的小时数)全套48小时)。
那么占用率就是变化的累积总和。通过在 2 天内按小时分组,我们可以得到 Average_arrivals 和 Average_occ。
数据
# Note, I could not load the sample data as provided, as the variable
# names included the desired data as text.
df <- data.frame(ID = 101:103,
Admissions = as.POSIXct(c("2018-12-10 09:30:00",
"2018-12-10 10:15:00", "2018-12-11 08:05:00")),
Discharges = as.POSIXct(c("2018-12-10 12:30:00",
"2018-12-11 07:45:00", "2018-12-11 09:05:00")))
解决方案
df_flat <- df %>%
gather(status, time, Admissions:Discharges) %>%
mutate(change = if_else(status == "Admissions", 1, -1)) %>%
group_by(time_hr = lubridate::floor_date(time, "1 hour")) %>%
summarize(arrivals = sum(status == "Admissions"),
change = sum(change)) %>%
# Here, adding add'l rows so all hours have 2 instances
padr::pad(end_val = min(.$time_hr) + dhours(47)) %>%
replace_na(list(arrivals = 0, change = 0)) %>%
mutate(occupancy = cumsum(change))
output <- df_flat %>%
group_by(hour(time_hr)) %>%
summarize(Average_arrivals = mean(arrivals),
Average_occ = mean(occupancy))
输出
output
# A tibble: 24 x 3
# hour Average_arrivals Average_occ
# <int> <dbl> <dbl>
# 1 0 0 0.5
# 2 1 0 0.5
# 3 2 0 0.5
# 4 3 0 0.5
# 5 4 0 0.5
# 6 5 0 0.5
# 7 6 0 0.5
# 8 7 0 0
# 9 8 0.5 0.5
# 10 9 0.5 0.5