结合 ifelse 和 any 创建新列 - R 3.3.2 Windows 7

Combining ifelse and any to create new column - R 3.3.2 Windows 7

我有一个 data.table,我正在尝试通过检查某行是否在任何给定的一组列中具有特定值来创建一个新列。

head(d1)

   MEDREC_KEY   pat_key           drug1          drug2          drug3       drug4        drug5       drug6      drug7     drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins     Ethambutol   Isoniazid   Macrolides Penicillins Quinolones Rifamycin    NA     NA     NA     NA
2: -606290573  85924804 Anti-infectives   Beta-lactams Cephalosporins Penicillins   Quinolones          NA         NA        NA    NA     NA     NA     NA
3: -615873176 161009395  Cephalosporins    Penicillins             NA          NA           NA          NA         NA        NA    NA     NA     NA     NA
4: -616819481  36280536 Anti-infectives Cephalosporins     Macrolides  Quinolones           NA          NA         NA        NA    NA     NA     NA     NA
5: -625709819 720290063 Anti-infectives Cephalosporins     Ethambutol   Isoniazid Pyrazinamide  Quinolones  Rifamycin        NA    NA     NA     NA     NA
6: -637094857 720918635 Anti-infectives    Penicillins     Quinolones          NA           NA          NA         NA        NA    NA     NA     NA     NA

我想要发生的是,如果任何 "drug" 列 == "Macrolides" 和任何相同的列 == "Cephalosporins" 那么我的新列 "correct" == 1 否则 "correct" == 0(或者它可能是合乎逻辑的),像这样:

head(d1)
   MEDREC_KEY   pat_key           drug1          drug2          drug3       drug4        drug5       drug6      drug7     drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins     Ethambutol   Isoniazid   Macrolides Penicillins Quinolones Rifamycin    NA     NA     NA     NA   1
2: -606290573  85924804 Anti-infectives   Beta-lactams Cephalosporins Penicillins   Quinolones          NA         NA        NA    NA     NA     NA     NA   0
3: -615873176 161009395  Cephalosporins    Penicillins             NA          NA           NA          NA         NA        NA    NA     NA     NA     NA   0
4: -616819481  36280536 Anti-infectives Cephalosporins     Macrolides  Quinolones           NA          NA         NA        NA    NA     NA     NA     NA   1
5: -625709819 720290063 Anti-infectives Cephalosporins     Ethambutol   Isoniazid Pyrazinamide  Quinolones  Rifamycin        NA    NA     NA     NA     NA   0
6: -637094857 720918635 Anti-infectives    Penicillins     Quinolones          NA           NA          NA         NA        NA    NA     NA     NA     NA   0

我已经尝试了这两种方法(但我仍在学习如何破译警告消息,因此这些消息帮助不大,尤其是我是 data.table 的新手):

> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
  12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
  Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
> 
> 
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
  Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).

我得到的最接近的是:

d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))

如果 其中之一 跨列为真,则将给出 TRUE,但如果 两者都为真,我不知道该怎么做 跨列都是正确的。我宁愿不必使用大量的 ifelse 语句,因为我有 12 列和更多的组合,我需要进行组合,而 NA 无论如何都会放弃它。

我喜欢 dplyr 或 data.table 解决方案,因为它们非常优雅,但此时我很绝望。

这应该有效:

d1$correct<-apply(d1, 1, function(r) { any(r =="Macrolides") & any(r == "Cephalosporins") })

这是一个想法:

library(dplyr)
library(tidyr)

df %>%
  gather(key, value, -MEDREC_KEY, -pat_key) %>%
  group_by(MEDREC_KEY, pat_key) %>%
  mutate(correct = +all(c("Macrolides", "Cephalosporins") %in% value)) %>%
  spread(key, value)

好的,我构建了一个示例并进行了试用。虽然不是 dplyr/tidyr 方法。

d1 <- data.table::data.table(x_key = c(-101,-102,-103), y_key = c(669,668,667), 
                            drug1 = c("Macrolides",NA,"Macrolides"), 
                            drug2 = c(NA, "Cephalosporins", "Cephalosporins"))

   x_key y_key      drug1          drug2
1:  -101   669 Macrolides             NA
2:  -102   668         NA Cephalosporins
3:  -103   667 Macrolides Cephalosporins

d1$correct <- rowSums(apply(d1, 2, function(r) (r %in% c("Macrolides", "Cephalosporins")))[,-c(1:2)]*1)>=2
d1
   x_key y_key      drug1          drug2  correct
1:  -101   669 Macrolides             NA    FALSE
2:  -102   668         NA Cephalosporins    FALSE
3:  -103   667 Macrolides Cephalosporins     TRUE

您遇到的问题是您使用索引 1 进行申请,而您真正想要的索引为 2。这将检查是否至少有 2 个 TRUE 只能按您希望的方式工作如果您不重复使用相同的药物(例如,2 个大环内酯类药物意味着 2 个正确,因此,正确 == 正确)。