数据框列和 R 中的 2 个列表之间的文本挖掘
Text mining between a data frame column and 2 lists in R
所以我创建了两个由单词组成的列表:
fruits <- c("banana","apple","strawberry")
homemade <- c("kitchen","homemade","mom","dad","sister")
这是我的数据集
description
isCake
apple cake cooked by mom
YES
pie from the bakery
NO
strawberry dessert by dad
NO
我想创建一个文本挖掘代码,以便当 df$description 包含一个或多个来自“fruits”的词和一个或多个来自 homemade 的词时,df$isCake 变为“OK”
预期输出
description
isCake
apple cake cooked by mom
YES
pie from the bakery
NO
strawberry dessert by dad
OK
df <- df %>% mutate(isCake=ifelse(description %in% fruits & description %in% homemade, "OK", isCake))
我没有收到任何错误消息,但显然是行不通的,因为当我对 isCake=="OK" 进行子集化时,我总是有 0 个 obs。
您可以从 fruits
和 homemade
矢量创建一个模式并在 grepl
中使用它:
df$isCake[grepl(paste0(fruits, collapse = '|'), df$description) &
grepl(paste0(homemade, collapse = '|'), df$description)] <- 'OK'
df
# description isCake
#1 apple cake cooked by mom OK
#2 pie from the bakery NO
#3 strawberry dessert by dad OK
所以我创建了两个由单词组成的列表:
fruits <- c("banana","apple","strawberry")
homemade <- c("kitchen","homemade","mom","dad","sister")
这是我的数据集
description | isCake |
---|---|
apple cake cooked by mom | YES |
pie from the bakery | NO |
strawberry dessert by dad | NO |
我想创建一个文本挖掘代码,以便当 df$description 包含一个或多个来自“fruits”的词和一个或多个来自 homemade 的词时,df$isCake 变为“OK”
预期输出
description | isCake |
---|---|
apple cake cooked by mom | YES |
pie from the bakery | NO |
strawberry dessert by dad | OK |
df <- df %>% mutate(isCake=ifelse(description %in% fruits & description %in% homemade, "OK", isCake))
我没有收到任何错误消息,但显然是行不通的,因为当我对 isCake=="OK" 进行子集化时,我总是有 0 个 obs。
您可以从 fruits
和 homemade
矢量创建一个模式并在 grepl
中使用它:
df$isCake[grepl(paste0(fruits, collapse = '|'), df$description) &
grepl(paste0(homemade, collapse = '|'), df$description)] <- 'OK'
df
# description isCake
#1 apple cake cooked by mom OK
#2 pie from the bakery NO
#3 strawberry dessert by dad OK