如何 return 一个 DataFrame 中的行与另一个 DataFrame 中的行部分匹配(字符串匹配)
How to return rows in one DataFrame that partially match the rows in another DataFrame (string match)
我想return list2 中包含list1 中字符串的所有行。
list1 <- tibble(name = c("the setosa is pretty", "the versicolor is the best", "the mazda is not a flower"))
list2 <- tibble(name = c("the setosa is pretty and the best flower", "the versicolor is the best and a red flower", "the mazda is a great car"))
例如,代码应该来自列表 2 return "the setosa is pretty and the best flower",因为它包含来自列表 1 的短语 "the setosa is pretty"。我试过:
grepl(list1$name, list2$name)
但我收到以下警告:
”警告信息:
在 grepl(commonPhrasesNPSLessthan6$value, dfNPSLessthan6$nps_comment) 中:
参数 'pattern' 的长度 > 1,并且只会使用第一个元素。
非常感谢您的帮助!谢谢!
编辑
list1 <- structure(list(value = c("it would not let me", "to go back and change",
"i was not able to", "there is no way to", "to pay for a credit"
), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
list2 <- structure(list(comment = c("it would not let me go back and change things",
"There is no way to back up without starting allover.", "Could not link blah blah account. ",
"i really just want to speak to someone - and, now that I'm at the very end of the process-",
"i felt that some of the information that was asked to provide wasn't necessary",
"i was not able to to go back and make changes")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame")
)
编辑 根据新数据:
list2 %>%
filter(stringr::str_detect(comment,paste0(list1$value,collapse = "|")))
# A tibble: 2 x 1
comment
<chr>
1 it would not let me go back and change things
2 i was not able to to go back and make changes
原创
一个stringr
选项:
list2[stringr::str_detect(list2$name,list1$name),]
# A tibble: 2 x 1
name
<chr>
1 the setosa is pretty and the best flower
2 the versicolor is the best and a red flower
一个base
唯一的解决方案:
list2[lengths(lapply(list1$name,grep,list2$name))>0,]
# A tibble: 2 x 1
name
<chr>
1 the setosa is pretty and the best flower
2 the versicolor is the best and a red flower
我想return list2 中包含list1 中字符串的所有行。
list1 <- tibble(name = c("the setosa is pretty", "the versicolor is the best", "the mazda is not a flower"))
list2 <- tibble(name = c("the setosa is pretty and the best flower", "the versicolor is the best and a red flower", "the mazda is a great car"))
例如,代码应该来自列表 2 return "the setosa is pretty and the best flower",因为它包含来自列表 1 的短语 "the setosa is pretty"。我试过:
grepl(list1$name, list2$name)
但我收到以下警告: ”警告信息: 在 grepl(commonPhrasesNPSLessthan6$value, dfNPSLessthan6$nps_comment) 中: 参数 'pattern' 的长度 > 1,并且只会使用第一个元素。
非常感谢您的帮助!谢谢!
编辑
list1 <- structure(list(value = c("it would not let me", "to go back and change",
"i was not able to", "there is no way to", "to pay for a credit"
), n = c(15L, 14L, 12L, 11L, 9L)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
list2 <- structure(list(comment = c("it would not let me go back and change things",
"There is no way to back up without starting allover.", "Could not link blah blah account. ",
"i really just want to speak to someone - and, now that I'm at the very end of the process-",
"i felt that some of the information that was asked to provide wasn't necessary",
"i was not able to to go back and make changes")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame")
)
编辑 根据新数据:
list2 %>%
filter(stringr::str_detect(comment,paste0(list1$value,collapse = "|")))
# A tibble: 2 x 1
comment
<chr>
1 it would not let me go back and change things
2 i was not able to to go back and make changes
原创
一个stringr
选项:
list2[stringr::str_detect(list2$name,list1$name),]
# A tibble: 2 x 1
name
<chr>
1 the setosa is pretty and the best flower
2 the versicolor is the best and a red flower
一个base
唯一的解决方案:
list2[lengths(lapply(list1$name,grep,list2$name))>0,]
# A tibble: 2 x 1
name
<chr>
1 the setosa is pretty and the best flower
2 the versicolor is the best and a red flower