根据具有相似名称的其他列改变变量

mutate variable based on other columns with similar names

我这里有一个df(想要的输出,我的起始df没有Flag变量):

df <- data.frame(
  Person = c('1','2','3'),
  Date = as.Date(c('2010-09-30', '2012-11-20', '2015-03-11')),
  Treatment_1 = as.Date(c('2010-09-30', '2012-11-21', '2015-03-22')),
  Treatment_2 = as.Date(c('2011-09-30', 'NA', '2011-03-22')),
  Treatment_3 = as.Date(c('2012-09-30', '2015-11-21', '2015-06-22')),
  Surgery_1 = as.Date(c(NA, '2016-11-21', '2015-03-12')),
  Surgery_2 = as.Date(c(NA, '2017-11-21', '2019-03-12')),
  Surgery_3 = as.Date(c(NA, '2018-11-21', '2013-03-12')),
  Flag = c('', 'Y', '') 
)

我想根据这些条件导出 Flag 变量:

  1. 对于任何以 Treatment 开头的列,如果 Date = Treatment
  2. ,则将 Flag 设置为“”
  3. 对于以 Surgery 开头的任何列,如果 Date = Surgery OR Date = Surgery +1 OR Date = Surgery - 1(基本上如果 Surgery 日期是当天、前一天或在 Date 变量一天后,将 Flag 设置为“”)。
  4. 否则设置 Flag = "Y"

我查看了 mutate_at,但它重写了变量并分配了 True/False 的值。

这是错误的,但这是我的尝试:

df2 <- df %>%
  mutate(Flag = case_when(
    vars(starts_with("Treatment"), Date == . ) ~ '',
    vars(starts_with("Surgery"), Date == . | Date == . - 1 | Date == . + 1) ~ '',
    TRUE ~ 'Y')
  )

对于case_when中的每个条件,我们可以使用rowwisec_across以及any。然后,我们可以为 Date(和 +1、-1 天)制作一个列表,以便 Surgery 匹配。

library(tidyverse)

df %>%
  rowwise() %>%
  mutate(Flag = case_when(
    any(c_across(starts_with("Treatment")) == Date) ~ "",
    any(c_across(starts_with("Surgery")) %in% c(Date, (Date +1), (Date-1))) ~ "",
    TRUE ~ "Y"
  ))

输出

  Person Date       Treatment_1 Treatment_2 Treatment_3 Surgery_1  Surgery_2  Surgery_3  Flag 
  <chr>  <date>     <date>      <date>      <date>      <date>     <date>     <date>     <chr>
1 1      2010-09-30 2010-09-30  2011-09-30  2012-09-30  NA         NA         NA         ""   
2 2      2012-11-20 2012-11-21  NA          2015-11-21  2016-11-21 2017-11-21 2018-11-21 "Y"  
3 3      2015-03-11 2015-03-22  2011-03-22  2015-06-22  2015-03-12 2019-03-12 2013-03-12 "" 

更新

这是一个可能的基础 R 解决方案,它比 tidyverse 快得多。这可以在一行代码中完成,但我认为可读性更好。首先,我复制 Surgery 列以便我们有 +1 天和 -1 天,然后将这些列转换为字符。然后,我对 Treatment 列进行子集化并转换为字符。我转换为字符,因为您无法将 Date%in%== 进行比较。然后,如果 Date 在任何列中,我使用 ifelse,那么我们 return "",如果不存在,则 return Y.然后,我将结果绑定回原始数据帧(从原始数据帧中减去 Flag)。

dup_names <- colnames(df)[startsWith(colnames(df), "Surgery")]

surgery <-
  cbind(df[dup_names], setNames(df[dup_names] + 1, paste0(dup_names, "_range1")))

surgery <-
  sapply(cbind(surgery, setNames(df[dup_names] - 1, paste0(
    dup_names, "_range2"
  ))), as.character)

treatment <-
  sapply(df[startsWith(colnames(df), "Treatment")], as.character)

cbind(subset(df, select = -Flag),
      Flag = ifelse(as.character(df[, 2]) %in% cbind(treatment, surgery) == TRUE, "", "Y"))

基准

更新添加data.table方法

如果您想要 data.table 方法,这里是:

df[melt(df, id=c(1,2))[,flag:=fifelse(
  (str_starts(variable,"T") & value==Date) | 
    (str_starts(variable,"S") & abs(value-Date)<=1),"", "Y")][
      , .(flag=min(flag,na.rm=T)), Person], on=.(Person)]

输出


   Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1:      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2:      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3:      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

我喜欢 Andrew 的方法,但当他的回答出现时我正在研究这个问题,所以如果您有兴趣,请看这里

df %>% inner_join(
  pivot_longer(df, cols=Treatment_1:Surgery_3) %>% 
    mutate(flag=case_when(
        (str_starts(name,"T") & value==Date) | (str_starts(name,"S") & abs(value-Date)<=1) ~ "",
        TRUE ~"Y")) %>% 
    group_by(Person) %>% 
    summarize(flag = min(flag))
)

输出:

  Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3 flag
1      1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>     
2      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21    Y
3      3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12     

这是使用 across 方法的替代方法:

library(tidyverse)

df %>% 
  mutate(across(starts_with("Treatment"), ~as.numeric(. %in% Date), .names ="new_{.col}"),
         across(starts_with("Surgery"), ~as.numeric(. %in% c(Date, Date+1, Date-1)), .names ="new_{.col}")) %>% 
  mutate(Flag = ifelse(rowSums(select(., contains('new')))==1, "", "Y"), .keep="used") %>% 
  bind_cols(df)
  Flag Person       Date Treatment_1 Treatment_2 Treatment_3  Surgery_1  Surgery_2  Surgery_3
1           1 2010-09-30  2010-09-30  2011-09-30  2012-09-30       <NA>       <NA>       <NA>
2    Y      2 2012-11-20  2012-11-21        <NA>  2015-11-21 2016-11-21 2017-11-21 2018-11-21
3           3 2015-03-11  2015-03-22  2011-03-22  2015-06-22 2015-03-12 2019-03-12 2013-03-12