如何使用 tidyverse 根据日期条件从同一个 table 中的列表中将行放入我的 table 中?
how to bring rows in my table from a list inside of the same table based on dates conditions, with tidyverse?
我正在尝试根据我拥有的不同日期的条件,将列表中的行带入我拥有的小标题中。我希望用 tidyverse 库解决这个问题。
这是我拥有的一种数据类型:
table_age <- structure(list(id = c(1, 1, 2, 3, 4, 5, 6), age_band = c("5_9",
"5_9", "10_14", "15-19", "20-24", "5_9", "10_14"), start_date = c("2020-01-01 00:08",
"2020-02-01 00:00", "2020-01-08 10:08", "2020-01-02 17:08", "2020-01-08 16:08",
"2020-01-10 08:08", "2020-01-03 09:08"), end_date = c("2020-01-04 10:08",
"2020-02-11 00:00", "2020-01-09 10:08", "2020-01-03 19:08", "2020-01-11 16:08",
"2019-01-30 08:08", "2020-01-05 09:08")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
它看起来像这样:
table_age
# A tibble: 7 x 4
id age_band start_date end_date
<dbl> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 5_9 2020-02-01 00:00 2020-02-11 00:00
3 2 10_14 2020-01-08 10:08 2020-01-09 10:08
4 3 15-19 2020-01-02 17:08 2020-01-03 19:08
5 4 20-24 2020-01-08 16:08 2020-01-11 16:08
6 5 5_9 2020-01-10 08:08 2019-01-30 08:08
7 6 10_14 2020-01-03 09:08 2020-01-05 09:08
>
我的第二种数据类型是:
structure(list(id = c(1, 1, 2, 2, 3, 4, 5, 6), med_name_one = c("Co-amoxiclav",
"doxycycline", "Gentamicin", "Co-trimoxazole", "Gentamicin",
"Co-trimoxazole", "Sodium Chloride", "Piperacillin"), med_name_two = c(NA,
"Gentamicin", "Co-trimoxazole", NA, NA, NA, NA, NA), mg_one = c("411 mg",
"120 mg", "11280 mg", "8 mg", "11280 mg", "8 mg", "411 mg", "120 mg"
), mg_two = c(NA, "11280 mg", "8 mg", NA, NA, NA, NA, NA), administration_datetime = c("2020-01-03 10:08",
"2020-01-01 11:08", "2020-01-02 19:08", "2020-01-08 20:08", "2020-01-02 19:08",
"2020-01-08 20:08", "2019-01-30 08:08", "2020-01-03 09:08")), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
它的外观:
table
# A tibble: 8 x 6
id med_name_one med_name_two mg_one mg_two administration_datetime
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08
3 2 Gentamicin Co-trimoxazole 11280 mg 8 mg 2020-01-02 19:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 5 Sodium Chloride NA 411 mg NA 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08
现在我必须合并这两个数据集的条件是:
如果table
中的administration_datetime
在table_age
的start_date
和end_date
区间内则合并所有信息来自两个 tibles
如果table
中的administration_datetime
在table_age
的start_date
和end_date
的区间之外,保留id信息来自 table_age
并为 the rows in table
提供 NA
。保留此类行的原因是因为它们出现在 table_age 中并且不想丢失来自此特定 table 的信息。在下面的第 3 点中似乎是相反的情况,但事实并非如此。
然而,如果 table 中的 administration_datetime
超出 table_age 中的两个日期时间间隔,则根本不要合并这些行。因为如果我这样做,那么间隔 date_times 将重复自己。我不希望发生这种情况。
这是我要的table类型:
table_answer
# A tibble: 8 x 9
id med_name_one med_name_two mg_one mg_two administration_datetime age_band start_date end_date
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 NA NA NA NA NA 5_9 2020-02-01 00:00 2020-02-11 00:00
3 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08 5_9 2020-01-01 00:08 2020-01-04 10:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 10_14 2020-01-08 10:08 2020-01-09 10:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08 15-19 2020-01-02 17:08 2020-01-03 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 20-24 2020-01-08 16:08 2020-01-11 16:08
7 5 NA NA NA NA NA 5_9 2020-01-10 08:08 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08 10_14 2020-01-03 09:08 2020-01-05 09:08
- 我尝试使用 tidyverse 中的 inner_join 但没有成功
table_joined <- inner_join(table_age, table)
看看这个:
table_joined
# A tibble: 10 x 9
id age_band start_date end_date med_name_one med_name_two mg_one mg_two administration_dateti…
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 5_9 2020-01-01 00:08 2020-01-04 10:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
3 1 5_9 2020-02-01 00:00 2020-02-11 00:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
4 1 5_9 2020-02-01 00:00 2020-02-11 00:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
5 2 10_14 2020-01-08 10:08 2020-01-09 10:… Gentamicin Co-trimoxazole 11280 … 8 mg 2020-01-02 19:08
6 2 10_14 2020-01-08 10:08 2020-01-09 10:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 3 15-19 2020-01-02 17:08 2020-01-03 19:… Gentamicin NA 11280 … NA 2020-01-02 19:08
8 4 20-24 2020-01-08 16:08 2020-01-11 16:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
9 5 5_9 2020-01-10 08:08 2019-01-30 08:… Sodium Chloride NA 411 mg NA 2019-01-30 08:08
10 6 10_14 2020-01-03 09:08 2020-01-05 09:… Piperacillin NA 120 mg NA 2020-01-03 09:08
我尝试将 table
嵌套到 table_age
中,然后尝试使用我创建的函数获取所需的行,然后应用 map
来自 purrr library
的函数:
table_nested <- nest_join(table_age, table)
get_medication_name <- function(medication_name_df) {
medication_name <- medication_name_df %>%
dplyr::group_by(id) %>%
dplyr::arrange(administered_datetime) %>%
pull(med_name_one)
}
table_answer <- mutate(medication_name = purrr::map(table_nested, get_medication_name))
但据我所知,我没有在此处设置任何条件,我至少希望能更接近于此功能。最重要的是我得到了各种各样的错误。
有没有办法实现这一目标,但我更倾向于采用第二种解决方案的结果。当然,我可以沿着其他方向走,但是第二个我可以在我已经拥有的更广泛的东西上进行构建。
您可以使用 data.table
,它允许非相等连接。
请注意,需要将字符日期转换为 POSIXct
,以便非等值连接起作用。
library(data.table) -8L), class = c("tbl_df", "tbl", "data.frame"))
setDT(table_age)
setDT(table_)
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_age[,start_date_join:=as.POSIXct(start_date)]
table_age[,end_date_join:=as.POSIXct(end_date)]
table_[table_age,on=.(id=id,administration_datetime_join>=start_date_join,administration_datetime_join<=end_date_join)][
,.(id, age_band, start_date, end_date, med_name_one, med_name_two, mg_one, mg_two, administration_datetime)]
id age_band start_date end_date med_name_one med_name_two mg_one mg_two administration_datetime
1: 1 5_9 2020-01-01 00:08 2020-01-04 10:08 Co-amoxiclav <NA> 411 mg <NA> 2020-01-03 10:08
2: 1 5_9 2020-01-01 00:08 2020-01-04 10:08 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08
3: 1 5_9 2020-02-01 00:00 2020-02-11 00:00 <NA> <NA> <NA> <NA> <NA>
4: 2 10_14 2020-01-08 10:08 2020-01-09 10:08 Co-trimoxazole <NA> 8 mg <NA> 2020-01-08 20:08
5: 3 15-19 2020-01-02 17:08 2020-01-03 19:08 Gentamicin <NA> 11280 mg <NA> 2020-01-02 19:08
6: 4 20-24 2020-01-08 16:08 2020-01-11 16:08 Co-trimoxazole <NA> 8 mg <NA> 2020-01-08 20:08
7: 5 5_9 2020-01-10 08:08 2019-01-30 08:08 <NA> <NA> <NA> <NA> <NA>
8: 6 10_14 2020-01-03 09:08 2020-01-05 09:08 Piperacillin <NA> 120 mg <NA> 2020-01-03 09:08
我正在尝试根据我拥有的不同日期的条件,将列表中的行带入我拥有的小标题中。我希望用 tidyverse 库解决这个问题。
这是我拥有的一种数据类型:
table_age <- structure(list(id = c(1, 1, 2, 3, 4, 5, 6), age_band = c("5_9",
"5_9", "10_14", "15-19", "20-24", "5_9", "10_14"), start_date = c("2020-01-01 00:08",
"2020-02-01 00:00", "2020-01-08 10:08", "2020-01-02 17:08", "2020-01-08 16:08",
"2020-01-10 08:08", "2020-01-03 09:08"), end_date = c("2020-01-04 10:08",
"2020-02-11 00:00", "2020-01-09 10:08", "2020-01-03 19:08", "2020-01-11 16:08",
"2019-01-30 08:08", "2020-01-05 09:08")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
它看起来像这样:
table_age
# A tibble: 7 x 4
id age_band start_date end_date
<dbl> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 5_9 2020-02-01 00:00 2020-02-11 00:00
3 2 10_14 2020-01-08 10:08 2020-01-09 10:08
4 3 15-19 2020-01-02 17:08 2020-01-03 19:08
5 4 20-24 2020-01-08 16:08 2020-01-11 16:08
6 5 5_9 2020-01-10 08:08 2019-01-30 08:08
7 6 10_14 2020-01-03 09:08 2020-01-05 09:08
>
我的第二种数据类型是:
structure(list(id = c(1, 1, 2, 2, 3, 4, 5, 6), med_name_one = c("Co-amoxiclav",
"doxycycline", "Gentamicin", "Co-trimoxazole", "Gentamicin",
"Co-trimoxazole", "Sodium Chloride", "Piperacillin"), med_name_two = c(NA,
"Gentamicin", "Co-trimoxazole", NA, NA, NA, NA, NA), mg_one = c("411 mg",
"120 mg", "11280 mg", "8 mg", "11280 mg", "8 mg", "411 mg", "120 mg"
), mg_two = c(NA, "11280 mg", "8 mg", NA, NA, NA, NA, NA), administration_datetime = c("2020-01-03 10:08",
"2020-01-01 11:08", "2020-01-02 19:08", "2020-01-08 20:08", "2020-01-02 19:08",
"2020-01-08 20:08", "2019-01-30 08:08", "2020-01-03 09:08")), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))
它的外观:
table
# A tibble: 8 x 6
id med_name_one med_name_two mg_one mg_two administration_datetime
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08
3 2 Gentamicin Co-trimoxazole 11280 mg 8 mg 2020-01-02 19:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 5 Sodium Chloride NA 411 mg NA 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08
现在我必须合并这两个数据集的条件是:
如果
table
中的administration_datetime
在table_age
的start_date
和end_date
区间内则合并所有信息来自两个 tibles如果
table
中的administration_datetime
在table_age
的start_date
和end_date
的区间之外,保留id信息来自table_age
并为the rows in table
提供NA
。保留此类行的原因是因为它们出现在 table_age 中并且不想丢失来自此特定 table 的信息。在下面的第 3 点中似乎是相反的情况,但事实并非如此。然而,如果 table 中的
administration_datetime
超出 table_age 中的两个日期时间间隔,则根本不要合并这些行。因为如果我这样做,那么间隔 date_times 将重复自己。我不希望发生这种情况。
这是我要的table类型:
table_answer
# A tibble: 8 x 9
id med_name_one med_name_two mg_one mg_two administration_datetime age_band start_date end_date
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 Co-amoxiclav NA 411 mg NA 2020-01-03 10:08 5_9 2020-01-01 00:08 2020-01-04 10:08
2 1 NA NA NA NA NA 5_9 2020-02-01 00:00 2020-02-11 00:00
3 1 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08 5_9 2020-01-01 00:08 2020-01-04 10:08
4 2 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 10_14 2020-01-08 10:08 2020-01-09 10:08
5 3 Gentamicin NA 11280 mg NA 2020-01-02 19:08 15-19 2020-01-02 17:08 2020-01-03 19:08
6 4 Co-trimoxazole NA 8 mg NA 2020-01-08 20:08 20-24 2020-01-08 16:08 2020-01-11 16:08
7 5 NA NA NA NA NA 5_9 2020-01-10 08:08 2019-01-30 08:08
8 6 Piperacillin NA 120 mg NA 2020-01-03 09:08 10_14 2020-01-03 09:08 2020-01-05 09:08
- 我尝试使用 tidyverse 中的 inner_join 但没有成功
table_joined <- inner_join(table_age, table)
看看这个:
table_joined
# A tibble: 10 x 9
id age_band start_date end_date med_name_one med_name_two mg_one mg_two administration_dateti…
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 5_9 2020-01-01 00:08 2020-01-04 10:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
2 1 5_9 2020-01-01 00:08 2020-01-04 10:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
3 1 5_9 2020-02-01 00:00 2020-02-11 00:… Co-amoxiclav NA 411 mg NA 2020-01-03 10:08
4 1 5_9 2020-02-01 00:00 2020-02-11 00:… doxycycline Gentamicin 120 mg 11280 … 2020-01-01 11:08
5 2 10_14 2020-01-08 10:08 2020-01-09 10:… Gentamicin Co-trimoxazole 11280 … 8 mg 2020-01-02 19:08
6 2 10_14 2020-01-08 10:08 2020-01-09 10:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
7 3 15-19 2020-01-02 17:08 2020-01-03 19:… Gentamicin NA 11280 … NA 2020-01-02 19:08
8 4 20-24 2020-01-08 16:08 2020-01-11 16:… Co-trimoxazole NA 8 mg NA 2020-01-08 20:08
9 5 5_9 2020-01-10 08:08 2019-01-30 08:… Sodium Chloride NA 411 mg NA 2019-01-30 08:08
10 6 10_14 2020-01-03 09:08 2020-01-05 09:… Piperacillin NA 120 mg NA 2020-01-03 09:08
我尝试将
table
嵌套到table_age
中,然后尝试使用我创建的函数获取所需的行,然后应用map
来自purrr library
的函数:table_nested <- nest_join(table_age, table) get_medication_name <- function(medication_name_df) { medication_name <- medication_name_df %>% dplyr::group_by(id) %>% dplyr::arrange(administered_datetime) %>% pull(med_name_one) } table_answer <- mutate(medication_name = purrr::map(table_nested, get_medication_name))
但据我所知,我没有在此处设置任何条件,我至少希望能更接近于此功能。最重要的是我得到了各种各样的错误。
有没有办法实现这一目标,但我更倾向于采用第二种解决方案的结果。当然,我可以沿着其他方向走,但是第二个我可以在我已经拥有的更广泛的东西上进行构建。
您可以使用 data.table
,它允许非相等连接。
请注意,需要将字符日期转换为 POSIXct
,以便非等值连接起作用。
library(data.table) -8L), class = c("tbl_df", "tbl", "data.frame"))
setDT(table_age)
setDT(table_)
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_age[,start_date_join:=as.POSIXct(start_date)]
table_age[,end_date_join:=as.POSIXct(end_date)]
table_[table_age,on=.(id=id,administration_datetime_join>=start_date_join,administration_datetime_join<=end_date_join)][
,.(id, age_band, start_date, end_date, med_name_one, med_name_two, mg_one, mg_two, administration_datetime)]
id age_band start_date end_date med_name_one med_name_two mg_one mg_two administration_datetime
1: 1 5_9 2020-01-01 00:08 2020-01-04 10:08 Co-amoxiclav <NA> 411 mg <NA> 2020-01-03 10:08
2: 1 5_9 2020-01-01 00:08 2020-01-04 10:08 doxycycline Gentamicin 120 mg 11280 mg 2020-01-01 11:08
3: 1 5_9 2020-02-01 00:00 2020-02-11 00:00 <NA> <NA> <NA> <NA> <NA>
4: 2 10_14 2020-01-08 10:08 2020-01-09 10:08 Co-trimoxazole <NA> 8 mg <NA> 2020-01-08 20:08
5: 3 15-19 2020-01-02 17:08 2020-01-03 19:08 Gentamicin <NA> 11280 mg <NA> 2020-01-02 19:08
6: 4 20-24 2020-01-08 16:08 2020-01-11 16:08 Co-trimoxazole <NA> 8 mg <NA> 2020-01-08 20:08
7: 5 5_9 2020-01-10 08:08 2019-01-30 08:08 <NA> <NA> <NA> <NA> <NA>
8: 6 10_14 2020-01-03 09:08 2020-01-05 09:08 Piperacillin <NA> 120 mg <NA> 2020-01-03 09:08