正则表达式以任何顺序匹配单词，但单词可以是可选的

Question

找了半天没找到满足我要求的正则表达式

我有多行这样的文字：

male positive average
average negative female
good negative female
female bad
male average
male
female
...
...

在上面的例子中有三组词(male|female), (good|average|bad), and (positive|negative)

我想捕获命名组中的每组词：分别是性别、质量和反馈。

我最接近的是：

(?=.*(?P<gender>\b(fe)?male\b))(?=.*(?P<quality>(green|amber|red)))(?=.*(?P<feedback>(positive|negative))).*

哪个匹配组：性别、质量和反馈，顺序不限。

但不匹配or/and为以下类型的句子创建命名组：

female green
positive male
positive female
female bad
male average
male
female

注意：性别（男|女）很常见，每一行都会出现。此外，为简单起见，这里只提到了三个不同的组。根据需求，它甚至可以增长更多。

任何帮助将不胜感激。

Answer 1

您需要将您的正则表达式锚定到一行的开头 (^) 并使每个包含命名捕获组的正前瞻可选.

另外，您有一些编号的捕获组，它们可能是 non-capture 组，这不会造成混淆，因为您只对命名的捕获组感兴趣。最后，你漏掉了一些单词边界。

我建议你把你的表达改成下面这样。

^(?=.*(?P<gender>\b(?:fe)?male\b))?(?=.*(?P<quality>\b(?:green|amber|red)\b))?(?=.*(?P<feedback>\b(?:positive|negative)\b))?.*

Demo

正则表达式可以分解如下

^                         # match beginning of line
(?=                       # begin positive lookahead
  .*                      # match zero or more characters
  (?P<gender>             # begin named capture group 'gender'
    \b                    # match a word boundary
    (?:female|male)       # one of the two words 
    \b                    # match a word boundary
  )                       # end capture group 'gender'
)?                        # end positive lookahead and make it optional

(?=                       # begin positive lookahead
  .*                      # match zero or more characters
  (?P<quality>            # begin named capture group 'quality'
    \b                    # match a word boundary
    (?:green|amber|red)   # match one of the three words
    \b                    # match a word boundary
  )                       # end named capture group 'quality'
)?                        # end positive lookahead and make it optional

(?=                       # begin positive lookahead
  .*                      # match zero or more characters
  (?P<feedback>           # begin named capture group 'feedback'    
    \b                    # match a word boundary
    (?:positive|negative) # match one of the two words
    \b                    # match a word boundary
  )                       # end named capture group 'feedback'
)?                        # end positive lookahead and make it
.*                        # match zero or more characters (the line)

正则表达式以任何顺序匹配单词，但单词可以是可选的

Regular expression to match words in any order but words can be optional

python

regex

regex-group