结合 ifelse 和 any 创建新列 - R 3.3.2 Windows 7
Combining ifelse and any to create new column - R 3.3.2 Windows 7
我有一个 data.table
,我正在尝试通过检查某行是否在任何给定的一组列中具有特定值来创建一个新列。
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA
我想要发生的是,如果任何 "drug" 列 == "Macrolides" 和任何相同的列 == "Cephalosporins" 那么我的新列 "correct" == 1 否则 "correct" == 0(或者它可能是合乎逻辑的),像这样:
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA 1
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA 0
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA 0
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA 1
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA 0
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA 0
我已经尝试了这两种方法(但我仍在学习如何破译警告消息,因此这些消息帮助不大,尤其是我是 data.table 的新手):
> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
>
>
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).
我得到的最接近的是:
d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))
如果 其中之一 跨列为真,则将给出 TRUE
,但如果 两者都为真,我不知道该怎么做 跨列都是正确的。我宁愿不必使用大量的 ifelse 语句,因为我有 12 列和更多的组合,我需要进行组合,而 NA 无论如何都会放弃它。
我喜欢 dplyr 或 data.table 解决方案,因为它们非常优雅,但此时我很绝望。
这应该有效:
d1$correct<-apply(d1, 1, function(r) { any(r =="Macrolides") & any(r == "Cephalosporins") })
这是一个想法:
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -MEDREC_KEY, -pat_key) %>%
group_by(MEDREC_KEY, pat_key) %>%
mutate(correct = +all(c("Macrolides", "Cephalosporins") %in% value)) %>%
spread(key, value)
好的,我构建了一个示例并进行了试用。虽然不是 dplyr/tidyr 方法。
d1 <- data.table::data.table(x_key = c(-101,-102,-103), y_key = c(669,668,667),
drug1 = c("Macrolides",NA,"Macrolides"),
drug2 = c(NA, "Cephalosporins", "Cephalosporins"))
x_key y_key drug1 drug2
1: -101 669 Macrolides NA
2: -102 668 NA Cephalosporins
3: -103 667 Macrolides Cephalosporins
d1$correct <- rowSums(apply(d1, 2, function(r) (r %in% c("Macrolides", "Cephalosporins")))[,-c(1:2)]*1)>=2
d1
x_key y_key drug1 drug2 correct
1: -101 669 Macrolides NA FALSE
2: -102 668 NA Cephalosporins FALSE
3: -103 667 Macrolides Cephalosporins TRUE
您遇到的问题是您使用索引 1 进行申请,而您真正想要的索引为 2。这将检查是否至少有 2 个 TRUE 只能按您希望的方式工作如果您不重复使用相同的药物(例如,2 个大环内酯类药物意味着 2 个正确,因此,正确 == 正确)。
我有一个 data.table
,我正在尝试通过检查某行是否在任何给定的一组列中具有特定值来创建一个新列。
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA
我想要发生的是,如果任何 "drug" 列 == "Macrolides" 和任何相同的列 == "Cephalosporins" 那么我的新列 "correct" == 1 否则 "correct" == 0(或者它可能是合乎逻辑的),像这样:
head(d1)
MEDREC_KEY pat_key drug1 drug2 drug3 drug4 drug5 drug6 drug7 drug8 drug9 drug10 drug11 drug12 correct
1: -140665983 669723105 Anti-infectives Cephalosporins Ethambutol Isoniazid Macrolides Penicillins Quinolones Rifamycin NA NA NA NA 1
2: -606290573 85924804 Anti-infectives Beta-lactams Cephalosporins Penicillins Quinolones NA NA NA NA NA NA NA 0
3: -615873176 161009395 Cephalosporins Penicillins NA NA NA NA NA NA NA NA NA NA 0
4: -616819481 36280536 Anti-infectives Cephalosporins Macrolides Quinolones NA NA NA NA NA NA NA NA 1
5: -625709819 720290063 Anti-infectives Cephalosporins Ethambutol Isoniazid Pyrazinamide Quinolones Rifamycin NA NA NA NA NA 0
6: -637094857 720918635 Anti-infectives Penicillins Quinolones NA NA NA NA NA NA NA NA NA 0
我已经尝试了这两种方法(但我仍在学习如何破译警告消息,因此这些消息帮助不大,尤其是我是 data.table 的新手):
> d1$correct<-ifelse(d1[,c(3:14)]=="Macrolides" | d1[,c(3:14)]=="Cephalosporins", 1, 0)
Warning messages:
1: In `[<-.data.table`(x, j = name, value = value) :
12 column matrix RHS of := will be treated as one vector
2: In `[<-.data.table`(x, j = name, value = value) :
Supplied 56868 items to be assigned to 4739 items of column 'correct' (52129 unused)
>
>
> selected_cols<-c("drug1", "drug2", "drug3", "drug4", "drug5", "drug6", "drug7", "drug8", "drug9", "drug10", "drug11", "drug12")
> d1$correct<-ifelse(d1 %in% selected_cols=="Macrolides" | d1 %in% selected_cols=="Cephalosporins", 1, 0)
Warning message:
In `[<-.data.table`(x, j = name, value = value) :
Supplied 16 items to be assigned to 4739 items of column 'correct' (recycled leaving remainder of 3 items).
我得到的最接近的是:
d1$correct<-apply(d1, 1, function(r) any(r %in% c("Macrolides", "Cephalosporins")))
如果 其中之一 跨列为真,则将给出 TRUE
,但如果 两者都为真,我不知道该怎么做 跨列都是正确的。我宁愿不必使用大量的 ifelse 语句,因为我有 12 列和更多的组合,我需要进行组合,而 NA 无论如何都会放弃它。
我喜欢 dplyr 或 data.table 解决方案,因为它们非常优雅,但此时我很绝望。
这应该有效:
d1$correct<-apply(d1, 1, function(r) { any(r =="Macrolides") & any(r == "Cephalosporins") })
这是一个想法:
library(dplyr)
library(tidyr)
df %>%
gather(key, value, -MEDREC_KEY, -pat_key) %>%
group_by(MEDREC_KEY, pat_key) %>%
mutate(correct = +all(c("Macrolides", "Cephalosporins") %in% value)) %>%
spread(key, value)
好的,我构建了一个示例并进行了试用。虽然不是 dplyr/tidyr 方法。
d1 <- data.table::data.table(x_key = c(-101,-102,-103), y_key = c(669,668,667),
drug1 = c("Macrolides",NA,"Macrolides"),
drug2 = c(NA, "Cephalosporins", "Cephalosporins"))
x_key y_key drug1 drug2
1: -101 669 Macrolides NA
2: -102 668 NA Cephalosporins
3: -103 667 Macrolides Cephalosporins
d1$correct <- rowSums(apply(d1, 2, function(r) (r %in% c("Macrolides", "Cephalosporins")))[,-c(1:2)]*1)>=2
d1
x_key y_key drug1 drug2 correct
1: -101 669 Macrolides NA FALSE
2: -102 668 NA Cephalosporins FALSE
3: -103 667 Macrolides Cephalosporins TRUE
您遇到的问题是您使用索引 1 进行申请,而您真正想要的索引为 2。这将检查是否至少有 2 个 TRUE 只能按您希望的方式工作如果您不重复使用相同的药物(例如,2 个大环内酯类药物意味着 2 个正确,因此,正确 == 正确)。