返回在文本中找到的特定字符串
Returning Specific String found in text
我在df中有以下列
c("I love bananas and apples.",
"I hate apples and pears.",
"I love to eat food.",
"I hate lettuce and bananas")
我有一个水果向量
fruit <- c("apples", "bananas", "pears")
我知道使用 str_detect 可以 return TRUE
或 FALSE
每次观察使用
str_detect(df$text, paste(fruit, collapse='|'))
但我想要的是一个具有匹配变量的列,如下所示
"I love bananas and apples." "bananas","apples"
"I hate apples and pears." "apples","pears"
"I love to eat food."
"I hate lettuce and bananas." "bananas"
有没有办法做到这一点?这是在 str_detect 域之外吗?
我们可以使用 str_extract_all
从 list
中的 'text' 列中提取所有 'fruit' 元素,用 [= 遍历 list
14=] 和 paste
(toString
) 它们一起创建了 'newtext' 列
library(stringr)
library(dplyr)
library(purrr)
df %>%
mutate(newtext = map_chr(str_extract_all(text,
str_c(fruit, collapse='|')), ~toString(unique(.x)))
sapply(v, function(s){
toString(unlist(lapply(fruit, function(f){
if(grepl(f, s)) f
})))
},
USE.NAMES = FALSE)
#[1] "apples, bananas" "apples, pears" "" "bananas"
我在df中有以下列
c("I love bananas and apples.",
"I hate apples and pears.",
"I love to eat food.",
"I hate lettuce and bananas")
我有一个水果向量
fruit <- c("apples", "bananas", "pears")
我知道使用 str_detect 可以 return TRUE
或 FALSE
每次观察使用
str_detect(df$text, paste(fruit, collapse='|'))
但我想要的是一个具有匹配变量的列,如下所示
"I love bananas and apples." "bananas","apples"
"I hate apples and pears." "apples","pears"
"I love to eat food."
"I hate lettuce and bananas." "bananas"
有没有办法做到这一点?这是在 str_detect 域之外吗?
我们可以使用 str_extract_all
从 list
中的 'text' 列中提取所有 'fruit' 元素,用 [= 遍历 list
14=] 和 paste
(toString
) 它们一起创建了 'newtext' 列
library(stringr)
library(dplyr)
library(purrr)
df %>%
mutate(newtext = map_chr(str_extract_all(text,
str_c(fruit, collapse='|')), ~toString(unique(.x)))
sapply(v, function(s){
toString(unlist(lapply(fruit, function(f){
if(grepl(f, s)) f
})))
},
USE.NAMES = FALSE)
#[1] "apples, bananas" "apples, pears" "" "bananas"