数据框列和 R 中的 2 个列表之间的文本挖掘

Text mining between a data frame column and 2 lists in R

所以我创建了两个由单词组成的列表:

fruits <- c("banana","apple","strawberry")
homemade <- c("kitchen","homemade","mom","dad","sister")

这是我的数据集

description isCake
apple cake cooked by mom YES
pie from the bakery NO
strawberry dessert by dad NO

我想创建一个文本挖掘代码,以便当 df$description 包含一个或多个来自“fruits”的词和一个或多个来自 homemade 的词时,df$isCake 变为“OK”

预期输出

description isCake
apple cake cooked by mom YES
pie from the bakery NO
strawberry dessert by dad OK
df <- df %>% mutate(isCake=ifelse(description %in% fruits & description %in% homemade, "OK", isCake))

我没有收到任何错误消息,但显然是行不通的,因为当我对 isCake=="OK" 进行子集化时,我总是有 0 个 obs。

您可以从 fruitshomemade 矢量创建一个模式并在 grepl 中使用它:

df$isCake[grepl(paste0(fruits, collapse = '|'), df$description) & 
          grepl(paste0(homemade, collapse = '|'), df$description)] <- 'OK'

df
#                description isCake
#1  apple cake cooked by mom     OK
#2       pie from the bakery     NO
#3 strawberry dessert by dad     OK