使用 PCRE 在 R 中的正则表达式中进行多次匹配和多次排除

Question

我对 R 中的正则表达式还很陌生，我正在尝试匹配包含某些模式并排除某些模式的字符串向量。在Whosebug上搜索了一下，好像没有人问过类似的问题。这是要匹配的字符串mystring的向量。

mystring <- c("fhwjantdesd", "unwanted", "fdedsifrfed", "undesired", "sdsyessd", "yedsfd")

在这个 mystring 中，我想弄清楚 mystring 是否包括 "wanted" 的 6 个字母的任何排列，但不包括字符串 "wanted"。同样，包括 "desired" 的 7 个字母和 "yes" 的 3 个字母的任意排列，不包括字符串 "desired" 和 "yes".

所以grepl(pattern, mystring, perl = TRUE)的预期输出应该是：

[1] TRUE, FALSE, TRUE, FALSE, FALSE, TRUE

我想使用 grepl 的 perl 选项，它可以加速函数。有没有人可以提供一些关于此 pattern 的线索？您能否解释一下模式的每个部分意味着什么，因为我只是使用 PCRE 的初学者。谢谢

Answer 1

你可以这样试试

mystring <- c("fhwjantdesd", "unwanted", "fdedsifrfed", "undesired", "sdsyessd", "yedsfd")
Status <- NULL
str <- c("wanted", "desired", "yes")
index <- 1


for (i in mystring) {
  for (j in str) {
    char_length <- nchar(j)

    if (is.na(str_extract(string = i, pattern = j)) | str_extract(string = i, pattern = j) == F) {
      if (sum(unlist(strsplit(j, "")) %in% unlist(strsplit(i, ""))) >=  char_length) {
        Status[index] <- T
        break
      }
    }
  }

  if (is.na(Status[index])) {
    Status[index] = F
  }

  index <- index + 1
}

Status

  > Status
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE

Answer 2

下面的代码将在一定程度上起作用。

grepl("(^((?!yes|wanted|desired).)*$)", mystring, perl=TRUE)

只会排除以上字词。那是根据你的数据。

使用 PCRE 在 R 中的正则表达式中进行多次匹配和多次排除

Multiple matches and multiple excludes in regular expressions in R using PCRE

regex

pcre

r