正则表达式排除特定序列中的单词

Regex excluding words in certain sequence

如果某些字段不符合条件,我想过滤掉它们。问题是它们的顺序。我尝试了以下结构:

(EXCLUDING)(?!\(MONDAY)(.*MONDAY).*

(EXCLUDING)(?!\()(.*MONDAY).*

我想要实现的是找到一个过滤器而不是 catches EXCLUDING * MONDAY 但如果这些词之间有括号则不行。也就是我要抓:

EXCLUDING MONDAY
EXCLUDING WEDNESDAY AND MONDAY
EXCLUDING MONDAY AND WEDNESDAY
EXCLUDING MONDAY (WEDNESDAY IS OK)

但不是

EXCLUDING WEDNESDAY (MONDAY IS OK)

上面的表达式当然可以涵盖所有这些。在R.

中是运行

这个怎么样?

mystrings <- c("EXCLUDING MONDAY",
"EXCLUDING WEDNESDAY AND MONDAY",
"EXCLUDING MONDAY AND WEDNESDAY",
"EXCLUDING MONDAY (WEDNESDAY IS OK)",
"EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING[^\(]+MONDAY", mystrings)

> TRUE  TRUE  TRUE  TRUE FALSE

如果您只想匹配 ( 不应紧接在 MONDAY 之前的模式,您可以使用否定后向断言。您的正则表达式用于否定前瞻,这就是为什么它不能正常工作的原因 (MONDAY.

strs <- c("EXCLUDING MONDAY",
          "EXCLUDING WEDNESDAY AND MONDAY",
           "EXCLUDING MONDAY AND WEDNESDAY",
               "EXCLUDING MONDAY (WEDNESDAY IS OK)",
               "EXCLUDING WEDNESDAY (MONDAY IS OK)")

grepl("EXCLUDING.*(?<!\()MONDAY", strs, perl=TRUE)
# [1]  TRUE  TRUE  TRUE  TRUE FALSE