如何使用 tidyverse 根据日期条件从同一个 table 中的列表中将行放入我的 table 中？

Question

我正在尝试根据我拥有的不同日期的条件，将列表中的行带入我拥有的小标题中。我希望用 tidyverse 库解决这个问题。

这是我拥有的一种数据类型：

 table_age <- structure(list(id = c(1, 1, 2, 3, 4, 5, 6), age_band = c("5_9", 
"5_9", "10_14", "15-19", "20-24", "5_9", "10_14"), start_date = c("2020-01-01 00:08", 
"2020-02-01 00:00", "2020-01-08 10:08", "2020-01-02 17:08", "2020-01-08 16:08", 
"2020-01-10 08:08", "2020-01-03 09:08"), end_date = c("2020-01-04 10:08", 
"2020-02-11 00:00", "2020-01-09 10:08", "2020-01-03 19:08", "2020-01-11 16:08", 
"2019-01-30 08:08", "2020-01-05 09:08")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))

它看起来像这样：

table_age
# A tibble: 7 x 4
     id age_band start_date       end_date        
  <dbl> <chr>    <chr>            <chr>           
1     1 5_9      2020-01-01 00:08 2020-01-04 10:08
2     1 5_9      2020-02-01 00:00 2020-02-11 00:00
3     2 10_14    2020-01-08 10:08 2020-01-09 10:08
4     3 15-19    2020-01-02 17:08 2020-01-03 19:08
5     4 20-24    2020-01-08 16:08 2020-01-11 16:08
6     5 5_9      2020-01-10 08:08 2019-01-30 08:08
7     6 10_14    2020-01-03 09:08 2020-01-05 09:08
>

我的第二种数据类型是：

structure(list(id = c(1, 1, 2, 2, 3, 4, 5, 6), med_name_one = c("Co-amoxiclav", 
"doxycycline", "Gentamicin", "Co-trimoxazole", "Gentamicin", 
"Co-trimoxazole", "Sodium Chloride", "Piperacillin"), med_name_two = c(NA, 
"Gentamicin", "Co-trimoxazole", NA, NA, NA, NA, NA), mg_one = c("411 mg", 
"120 mg", "11280 mg", "8 mg", "11280 mg", "8 mg", "411 mg", "120 mg"
), mg_two = c(NA, "11280 mg", "8 mg", NA, NA, NA, NA, NA), administration_datetime = c("2020-01-03 10:08", 
"2020-01-01 11:08", "2020-01-02 19:08", "2020-01-08 20:08", "2020-01-02 19:08", 
"2020-01-08 20:08", "2019-01-30 08:08", "2020-01-03 09:08")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))

它的外观：

    table
# A tibble: 8 x 6
     id med_name_one    med_name_two   mg_one   mg_two   administration_datetime
  <dbl> <chr>           <chr>          <chr>    <chr>    <chr>                  
1     1 Co-amoxiclav    NA             411 mg   NA       2020-01-03 10:08       
2     1 doxycycline     Gentamicin     120 mg   11280 mg 2020-01-01 11:08       
3     2 Gentamicin      Co-trimoxazole 11280 mg 8 mg     2020-01-02 19:08       
4     2 Co-trimoxazole  NA             8 mg     NA       2020-01-08 20:08       
5     3 Gentamicin      NA             11280 mg NA       2020-01-02 19:08       
6     4 Co-trimoxazole  NA             8 mg     NA       2020-01-08 20:08       
7     5 Sodium Chloride NA             411 mg   NA       2019-01-30 08:08       
8     6 Piperacillin    NA             120 mg   NA       2020-01-03 09:08

现在我必须合并这两个数据集的条件是：

如果table中的administration_datetime在table_age的start_date和end_date区间内则合并所有信息来自两个 tibles
如果table中的administration_datetime在table_age的start_date和end_date的区间之外，保留id信息来自 table_age 并为 the rows in table 提供 NA。保留此类行的原因是因为它们出现在 table_age 中并且不想丢失来自此特定 table 的信息。在下面的第 3 点中似乎是相反的情况，但事实并非如此。
然而，如果 table 中的 administration_datetime 超出 table_age 中的两个日期时间间隔，则根本不要合并这些行。因为如果我这样做，那么间隔 date_times 将重复自己。我不希望发生这种情况。

这是我要的table类型：

table_answer
# A tibble: 8 x 9
     id med_name_one   med_name_two mg_one   mg_two   administration_datetime age_band start_date       end_date        
  <dbl> <chr>          <chr>        <chr>    <chr>    <chr>                   <chr>    <chr>            <chr>           
1     1 Co-amoxiclav   NA           411 mg   NA       2020-01-03 10:08        5_9      2020-01-01 00:08 2020-01-04 10:08
2     1 NA             NA           NA       NA       NA                      5_9      2020-02-01 00:00 2020-02-11 00:00
3     1 doxycycline    Gentamicin   120 mg   11280 mg 2020-01-01 11:08        5_9      2020-01-01 00:08 2020-01-04 10:08
4     2 Co-trimoxazole NA           8 mg     NA       2020-01-08 20:08        10_14    2020-01-08 10:08 2020-01-09 10:08
5     3 Gentamicin     NA           11280 mg NA       2020-01-02 19:08        15-19    2020-01-02 17:08 2020-01-03 19:08
6     4 Co-trimoxazole NA           8 mg     NA       2020-01-08 20:08        20-24    2020-01-08 16:08 2020-01-11 16:08
7     5 NA             NA           NA       NA       NA                      5_9      2020-01-10 08:08 2019-01-30 08:08
8     6 Piperacillin   NA           120 mg   NA       2020-01-03 09:08        10_14    2020-01-03 09:08 2020-01-05 09:08

我尝试使用 tidyverse 中的 inner_join 但没有成功 table_joined <- inner_join(table_age, table)

看看这个：

table_joined
# A tibble: 10 x 9
      id age_band start_date       end_date        med_name_one    med_name_two   mg_one  mg_two  administration_dateti…
   <dbl> <chr>    <chr>            <chr>           <chr>           <chr>          <chr>   <chr>   <chr>                 
 1     1 5_9      2020-01-01 00:08 2020-01-04 10:… Co-amoxiclav    NA             411 mg  NA      2020-01-03 10:08      
 2     1 5_9      2020-01-01 00:08 2020-01-04 10:… doxycycline     Gentamicin     120 mg  11280 … 2020-01-01 11:08      
 3     1 5_9      2020-02-01 00:00 2020-02-11 00:… Co-amoxiclav    NA             411 mg  NA      2020-01-03 10:08      
 4     1 5_9      2020-02-01 00:00 2020-02-11 00:… doxycycline     Gentamicin     120 mg  11280 … 2020-01-01 11:08      
 5     2 10_14    2020-01-08 10:08 2020-01-09 10:… Gentamicin      Co-trimoxazole 11280 … 8 mg    2020-01-02 19:08      
 6     2 10_14    2020-01-08 10:08 2020-01-09 10:… Co-trimoxazole  NA             8 mg    NA      2020-01-08 20:08      
 7     3 15-19    2020-01-02 17:08 2020-01-03 19:… Gentamicin      NA             11280 … NA      2020-01-02 19:08      
 8     4 20-24    2020-01-08 16:08 2020-01-11 16:… Co-trimoxazole  NA             8 mg    NA      2020-01-08 20:08      
 9     5 5_9      2020-01-10 08:08 2019-01-30 08:… Sodium Chloride NA             411 mg  NA      2019-01-30 08:08      
10     6 10_14    2020-01-03 09:08 2020-01-05 09:… Piperacillin    NA             120 mg  NA      2020-01-03 09:08

我尝试将 table 嵌套到 table_age 中，然后尝试使用我创建的函数获取所需的行，然后应用 map来自 purrr library 的函数：

table_nested <- nest_join(table_age, table)


get_medication_name <- function(medication_name_df) {

 medication_name <- medication_name_df %>%
 dplyr::group_by(id) %>%
 dplyr::arrange(administered_datetime) %>%
 pull(med_name_one)

}


table_answer <- mutate(medication_name = purrr::map(table_nested, get_medication_name))

但据我所知，我没有在此处设置任何条件，我至少希望能更接近于此功能。最重要的是我得到了各种各样的错误。

有没有办法实现这一目标，但我更倾向于采用第二种解决方案的结果。当然，我可以沿着其他方向走，但是第二个我可以在我已经拥有的更广泛的东西上进行构建。

Answer 1

您可以使用 data.table，它允许非相等连接。

请注意，需要将字符日期转换为 POSIXct，以便非等值连接起作用。

library(data.table)                                                                                                                                                                                                                                                                                                                                  -8L), class = c("tbl_df", "tbl", "data.frame"))
setDT(table_age)
setDT(table_)
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_[,administration_datetime_join:=as.POSIXct(administration_datetime)]
table_age[,start_date_join:=as.POSIXct(start_date)]
table_age[,end_date_join:=as.POSIXct(end_date)]
table_[table_age,on=.(id=id,administration_datetime_join>=start_date_join,administration_datetime_join<=end_date_join)][
                     ,.(id, age_band, start_date, end_date, med_name_one, med_name_two, mg_one, mg_two, administration_datetime)]

   id age_band       start_date         end_date   med_name_one med_name_two   mg_one   mg_two administration_datetime
1:  1      5_9 2020-01-01 00:08 2020-01-04 10:08   Co-amoxiclav         <NA>   411 mg     <NA>        2020-01-03 10:08
2:  1      5_9 2020-01-01 00:08 2020-01-04 10:08    doxycycline   Gentamicin   120 mg 11280 mg        2020-01-01 11:08
3:  1      5_9 2020-02-01 00:00 2020-02-11 00:00           <NA>         <NA>     <NA>     <NA>                    <NA>
4:  2    10_14 2020-01-08 10:08 2020-01-09 10:08 Co-trimoxazole         <NA>     8 mg     <NA>        2020-01-08 20:08
5:  3    15-19 2020-01-02 17:08 2020-01-03 19:08     Gentamicin         <NA> 11280 mg     <NA>        2020-01-02 19:08
6:  4    20-24 2020-01-08 16:08 2020-01-11 16:08 Co-trimoxazole         <NA>     8 mg     <NA>        2020-01-08 20:08
7:  5      5_9 2020-01-10 08:08 2019-01-30 08:08           <NA>         <NA>     <NA>     <NA>                    <NA>
8:  6    10_14 2020-01-03 09:08 2020-01-05 09:08   Piperacillin         <NA>   120 mg     <NA>        2020-01-03 09:08

如何使用 tidyverse 根据日期条件从同一个 table 中的列表中将行放入我的 table 中？

how to bring rows in my table from a list inside of the same table based on dates conditions, with tidyverse?

r

date

purrr

tidyverse