当条件匹配时在 R 中滞后

Question

我有一个数据框，其中仅包含体检日期和感染情况 (yes/no)，我想添加第三列来表示上次感染的日期。如果患者之前没有感染，新的 last_infection 列应该有 NA。如果他们以前感染过，它应该显示最近最近访问他们测试 "yes" 感染的日期。

我希望输出如下所示：

date      infection   last_infection
01-01-18  no          NA
06-01-18  no          NA
07-01-18  yes         NA
09-01-18  no          07-01-18
01-01-19  no          07-01-18
02-01-19  yes         07-01-18
03-01-19  yes         02-01-19
04-01-19  no          03-01-19
05-01-19  no          03-01-19

我如何在 R 中执行此操作？ lag() 之类的函数可以检查条件，还是我应该完全做其他事情？

Answer 1

我们可以根据使用 'infection' 创建的逻辑向量创建分组变量，并将其用于 lag 列。在这里，我们只加载 dplyr 而不是任何其他包

library(dplyr)
df1 %>%
   group_by(grp = cumsum(infection == "yes")) %>%
   mutate(new = first(date)) %>%
   ungroup %>%
   mutate(new = replace(lag(new), seq_len(match(1, grp)), NA)) %>%
   select(-grp)
# A tibble: 9 x 4
#  date     infection last_infection new     
#  <chr>    <chr>     <chr>          <chr>   
#1 01-01-18 no        <NA>           <NA>    
#2 06-01-18 no        <NA>           <NA>    
#3 07-01-18 yes       <NA>           <NA>    
#4 09-01-18 no        07-01-18       07-01-18
#5 01-01-19 no        07-01-18       07-01-18
#6 02-01-19 yes       07-01-18       07-01-18
#7 03-01-19 yes       02-01-19       02-01-19
#8 04-01-19 no        03-01-19       03-01-19
#9 05-01-19 no        03-01-19       03-01-19

数据

df1 <- structure(list(date = c("01-01-18", "06-01-18", "07-01-18", "09-01-18", 
"01-01-19", "02-01-19", "03-01-19", "04-01-19", "05-01-19"), 
    infection = c("no", "no", "yes", "no", "no", "yes", "yes", 
    "no", "no"), last_infection = c(NA, NA, NA, "07-01-18", "07-01-18", 
    "07-01-18", "02-01-19", "03-01-19", "03-01-19")),
    class = "data.frame", row.names = c(NA, 
-9L))

Answer 2

我建议改为这样。如果您使用 tidyr 包中的 fill，则没有理由使用 cumsum 或分组。

library(tidyverse)

df %>% 
  mutate(
    last_infection = if_else(lag(infection) == "yes", lag(date), NA_character_)
  ) %>% 
  fill(last_infection)
#> # A tibble: 9 x 3
#>   date     infection last_infection
#>   <chr>    <chr>     <chr>         
#> 1 01-01-18 no        <NA>          
#> 2 06-01-18 no        <NA>          
#> 3 07-01-18 yes       <NA>          
#> 4 09-01-18 no        07-01-18      
#> 5 01-01-19 no        07-01-18      
#> 6 02-01-19 yes       07-01-18      
#> 7 03-01-19 yes       02-01-19      
#> 8 04-01-19 no        03-01-19      
#> 9 05-01-19 no        03-01-19

^{由 reprex package (v0.3.0)}

于 2020 年 1 月 25 日创建

当条件匹配时在 R 中滞后

Lagging in R when a condition is matched

r

function

lag

数据