如何使用 {quanteda} 将各种不同的形容词连接到一个名词来构建字典？

Question

假设我有像下面这个示例文本这样的文本数据，我需要使用自建词典进行分析。

Good X. Perfect X. Magnificent X. Extraordinary X. Bad X. Abysmal X. Very poor X.

基于此，我想使用以下代码结构使用 {quanteda} 构建字典：

dict <- quanteda::dictionary(list(.))

有没有一种方法可以指定我要查找的形容词，而不必为每个感兴趣的形容词键入 X？

# Example of what I want to avoid:
dict <- quanteda::dictionary(list(
  list_1 = c("good X", "perfect X",...)
))

所以理想情况下，我需要像 "good/perfect/... X" 这样的东西，但我知道它不会像这样工作。有解决办法吗？

Answer 1

是的，您可以将前体形容词列为正则表达式，使用 "or" 运算符 |，然后是 X。

在这里，我使用 exclusive = FALSE 只是为了显示哪些标记已被密钥替换，哪些没有。

library("quanteda")
## Package version: 1.4.3
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.

dict <- dictionary(list(mykey = "^(good|perfect|magnificient)$ X"))

tokens("I had a good X at the magnificient X hotel.") %>%
  tokens_lookup(dictionary = dict, valuetype = "regex", exclusive = FALSE)
## tokens from 1 document.
## text1 :
## [1] "I"     "had"   "a"     "MYKEY" "at"    "the"   "MYKEY" "hotel" "."

如何使用 {quanteda} 将各种不同的形容词连接到一个名词来构建字典？

How to build dictionary using {quanteda} with various different adjectives connected to one single noun?

dictionary

r

quanteda