R：用一组 frame/dictionary 个单词的数据交叉检查文本文档

Question

刚接触 R 并在这里编码，我还是一名学生。我有一个文本文档，想用 dictionary/df 的集合进行交叉检查，我已经清理并将文档标记为它的词根。 objective 是将 Document1 与 dictionary1 进行交叉校验，看 document1 中是否有任何单词与字典中的单词匹配。如果是，文档将根据其 class 进行标记。示例将是这样的：

Document1 <- "One simple text"
dictionary1 <- data.frame("Term"= c("teacher", "simple", "shoot", "text"))

if (strcmp(Document1, dictionary1)){
print('Success')
} else {
print('Failed')
}

我使用此代码尝试的结果将打印为 "Failed"，即使在 Document1 中有一个单词 "simple" 和 "text" 匹配。我该如何解决这个问题？我是否需要先对 document1 执行 strsplit，然后使用 strcmp 函数进行比较？提前感谢任何可以为此提供解决方案的人。为我糟糕的英语道歉。

Answer 1

欢迎使用 SO，我们可以在 data.frame in r

中对大部分内容进行编程

library(tidyverse)

Document1 <- "One simple text"
dictionary1 <- data.frame("Term" = c("teacher", "simple", "shoot", "text"))


df_results <- dictionary1 %>%
  mutate(result = str_detect(string = Document1, pattern = Term))

df_results
#>      Term result
#> 1 teacher  FALSE
#> 2  simple   TRUE
#> 3   shoot  FALSE
#> 4    text   TRUE


if (any(df_results$result == TRUE)) {
  print("Sucess")
} else {
  print("Failure")
}
#> [1] "Sucess"

Document1 <- "Nothing Matters"
dictionary1 <- data.frame("Term" = c("teacher", "simple", "shoot", "text"))


df_results <- dictionary1 %>%
  mutate(result = str_detect(string = Document1, pattern = Term))

df_results
#>      Term result
#> 1 teacher  FALSE
#> 2  simple  FALSE
#> 3   shoot  FALSE
#> 4    text  FALSE


if (any(df_results$result == TRUE)) {
  print("Sucess")
} else {
  print("Failure")
}
#> [1] "Failure"

# Another way is using vectors

Document1 <- "simple right"
dictionary1 <- data.frame("Term" = c("teacher", "simple", "shoot", "text"))

result <- str_detect(string = Document1, pattern = dictionary1$Term)

result
#> [1] FALSE  TRUE FALSE FALSE

if (any(result == TRUE)) {
  print("Sucess")
} else {
  print("Failure")
}
#> [1] "Sucess"

^{由 reprex package (v0.3.0)}

创建于 2020-06-11

R：用一组 frame/dictionary 个单词的数据交叉检查文本文档

R: Cross check a text document with a set of data frame/dictionary of words

text

dictionary

r

classification