如何使用 R 查找单词的尾随和前导单词?

How to find trailing and leading words of a word using R?

我有一个有一百万字的文本文档。现在,我需要知道如何使用 R 查找单词的尾随单词和前导单词。

例如,如果我想找出单词"error"前后的单词。它可以是任何类似前导词的东西

"typo error"
"manual error"
"system error"

并带有像

这样的尾随词
"error corrected"
"error found"
"error occurred"

知道怎么做吗?预先感谢您的意见。

对于错误之前的单词:

x <- "no error and no error and some error" # input

library(gsubfn)
rx <- "(\w+) error"
table(strapplyc(x, rx)[[1]])

给予:

  no some 
   2    1

rx替换为错误后的单词:

rx <- "error (\w+)"

这个怎么样:

test <- c("I don't want to match error this This is a random error what I want to match")
# convert to a list  
words <- strsplit((test),' ')
# get indexes that match 'error' 
indexes <-   grep('error',words[[1]], perl=TRUE)

# select words that come after 'error'
words[[1]][indexes+1]
# before 'error'
words[[1]][indexes-1]

我的解决方案是 str_match_all:

library(stringr)
txt <- "system error corrected hardcore error detected wtf error holymoly"
str_match_all(txt, "\s*(\w+)\serror\s*(\w+)")

[[1]] 
     [,1]                       [,2]       [,3]        
[1,] "system error corrected"   "system"   "corrected" 
[2,] " hardcore error detected" "hardcore" "detected"  
[3,] " wtf error holymoly"      "wtf" "holymoly"