根据术语文档矩阵突出显示 R 字符串列表中的单词

Question

以下是活动数据的dataframe

 Subject                  Response Rate(%)     Campaign Type    Channel
  Buy Stunning Phone A        81.00                   A         e-mail
 Special Emi OFFER             81.00                   B            e-mail
 Buy Stunning Phone at EMI     73.00                   C            SMS
The game changer is here.      85.00                   A            SMS
 Buy Stunnig Phone A           80.00                   A            SMS
 Special Emi OFFER             88.00                   B         e-mail
 Buy Stunning Phone at EMI     48.00                   C        e-mail
The game changer is here.      48.00                   A         e-mail
Buy Stunning Phone             89.00                   A         e-mail
 Special Emi OFFER             89.00                   B         SMS
 Buy Stunning Phone at EMI     69.00                   C         SMS

我创建了一个术语文档矩阵如下

    Word    Frequency
     big    10
   upgrade  10
    worth   10
     latest 9
     much   9
    phone   8
 exciting   8
    back    7
  colours   7
    case    6
  stylish   6
   clear    6
experience  5
     time   5

我已经按照响应率降低的顺序对基于 dplyr 的通道类型的数据进行了子集化。我想针对每个主题突出显示/列出术语文档矩阵中的单词。如果单词出现在主题中，则在主题附近列为单独的列表。我无法找到执行此操作的方法。

Answer 1

你的意思是这样吗

library(dplyr)

df <- read.table(header = TRUE, sep = "," ,text = "Subject,Response Rate(%),Campaign Type,Channel
Buy Stunning Phone A,81.00,A,e-mail
Special Emi OFFER,81.00,B,e-mail
Buy Stunning Phone at EMI,73.00,C,SMS
The game changer is here.,85.00,A,SMS
Buy Stunnig Phone A,80.00,A,SMS
Special Emi OFFER,88.00,B,e-mail
Buy Stunning Phone at EMI,48.00,C,e-mail
The game changer is here.,48.00,A,e-mail
Buy Stunning Phone,89.00,A,e-mail
Special Emi OFFER,89.00,B,SMS
Buy Stunning Phone at EMI,69.00,C,SMS",)


df2 <- read.table(header = TRUE, sep = "," ,text = "Word,Frequency
big,10
upgrade,10
worth,10
latest,9
much,9
phone,8
exciting,8
back,7
colours,7
case,6
stylish,6
clear,6
experience,5
time,5",)

m = sapply(df2$Word %>% as.character() %>% trimws(),regexpr,text = df$Subject %>% as.character(),ignore.case = TRUE)

df$keyWord <- sapply(1:nrow(m),function(idx){
t = m[idx,] > 0 %>% unlist()
paste0(names(t)[t],collapse = ",")
})
df

根据术语文档矩阵突出显示 R 字符串列表中的单词

Highlight words in R list of strings based on term document matrix

lookup

r

subset

dplyr