通过比较 R 中列中的每个值与所有值(甚至值的顺序不同)来搜索和提取字符串
Search & Extract the string by comparing every value with all values with in the column (even order of values are different) in R
我有一列,我正在尝试将 1 值与所有其他值进行比较,直到最后一个值并提取匹配的字符串,即使值的顺序不同。(例如:第 20 行和第 24 行)输出数据框将是 df_out.
Col_1 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")
df_input = data.frame(Col_1)
输出数据框如下
Col_1 = c("AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF", "AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","MN,OP","MN,OP","MN,OP","MN,OP","MN,OP", "AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP",
"OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB")
Col_2 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB",
"AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")
match = c("Complete Match","AB,CD","NO Matching","AB","AB","AB,CD","Complete Match","NO Matching","AB","AB","NO Matching","NO Matching","Complete Match",
"MN,OP","MN,OP","AB","AB","MN,OP","Complete Match","Complete Match","AB","AB","MN,OP","Complete Match","Complete Match")
df_out = data.frame(Col_1,Col_2,match)
我已经尝试使用 grepl,但无法获得所需的输出。
这是一个(有点乱)解决方案:
funcmatch <- function(a, b) {
ma <- match(a, b)
if (all(is.na(ma)))
return("NO MATCH")
else if (sum(!is.na(ma)) == length(b))
return("COMPLETE MATCH")
else
return(paste0(a[na.omit(ma)], collapse = ","))
}
mapply(funcmatch, strsplit(Col_1, ","), strsplit(Col_2, ","))
以及解决方案:
[1] "COMPLETE MATCH" "AB,CD,EF" "NO MATCH"
[4] "AB" "EF" "COMPLETE MATCH"
[7] "COMPLETE MATCH" "NO MATCH" "AB"
[10] "EF" "NO MATCH" "NO MATCH"
[13] "COMPLETE MATCH" "OP,NA" "OP,MN"
[16] "AB" "AB" "COMPLETE MATCH"
[19] "COMPLETE MATCH" "COMPLETE MATCH" "OP"
[22] "OP" "COMPLETE MATCH" "COMPLETE MATCH"
[25] "COMPLETE MATCH"
我有一列,我正在尝试将 1 值与所有其他值进行比较,直到最后一个值并提取匹配的字符串,即使值的顺序不同。(例如:第 20 行和第 24 行)输出数据框将是 df_out.
Col_1 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")
df_input = data.frame(Col_1)
输出数据框如下
Col_1 = c("AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF","AB,CD,EF", "AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","AB,CD,EF,GH","MN,OP","MN,OP","MN,OP","MN,OP","MN,OP", "AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP","AB,MN,OP",
"OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB","OP,MN,AB")
Col_2 = c("AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB", "AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB",
"AB,CD,EF","AB,CD,EF,GH","MN,OP","AB,MN,OP","OP,MN,AB")
match = c("Complete Match","AB,CD","NO Matching","AB","AB","AB,CD","Complete Match","NO Matching","AB","AB","NO Matching","NO Matching","Complete Match",
"MN,OP","MN,OP","AB","AB","MN,OP","Complete Match","Complete Match","AB","AB","MN,OP","Complete Match","Complete Match")
df_out = data.frame(Col_1,Col_2,match)
我已经尝试使用 grepl,但无法获得所需的输出。
这是一个(有点乱)解决方案:
funcmatch <- function(a, b) {
ma <- match(a, b)
if (all(is.na(ma)))
return("NO MATCH")
else if (sum(!is.na(ma)) == length(b))
return("COMPLETE MATCH")
else
return(paste0(a[na.omit(ma)], collapse = ","))
}
mapply(funcmatch, strsplit(Col_1, ","), strsplit(Col_2, ","))
以及解决方案:
[1] "COMPLETE MATCH" "AB,CD,EF" "NO MATCH"
[4] "AB" "EF" "COMPLETE MATCH"
[7] "COMPLETE MATCH" "NO MATCH" "AB"
[10] "EF" "NO MATCH" "NO MATCH"
[13] "COMPLETE MATCH" "OP,NA" "OP,MN"
[16] "AB" "AB" "COMPLETE MATCH"
[19] "COMPLETE MATCH" "COMPLETE MATCH" "OP"
[22] "OP" "COMPLETE MATCH" "COMPLETE MATCH"
[25] "COMPLETE MATCH"