正则表达式以任何顺序匹配单词,但单词可以是可选的
Regular expression to match words in any order but words can be optional
找了半天没找到满足我要求的正则表达式
我有多行这样的文字:
male positive average
average negative female
good negative female
female bad
male average
male
female
...
...
在上面的例子中有三组词(male|female), (good|average|bad), and (positive|negative)
我想捕获命名组中的每组词:分别是性别、质量和反馈。
我最接近的是:
(?=.*(?P<gender>\b(fe)?male\b))(?=.*(?P<quality>(green|amber|red)))(?=.*(?P<feedback>(positive|negative))).*
哪个匹配组:性别、质量和反馈,顺序不限。
但不匹配or/and为以下类型的句子创建命名组:
female green
positive male
positive female
female bad
male average
male
female
注意:性别(男|女)很常见,每一行都会出现。此外,为简单起见,这里只提到了三个不同的组。根据需求,它甚至可以增长更多。
任何帮助将不胜感激。
您需要将您的正则表达式锚定到一行的开头 (^
) 并使每个包含命名捕获组的正前瞻 可选.
另外,您有一些编号的捕获组,它们可能是 non-capture 组,这不会造成混淆,因为您只对命名的捕获组感兴趣。最后,你漏掉了一些单词边界。
我建议你把你的表达改成下面这样。
^(?=.*(?P<gender>\b(?:fe)?male\b))?(?=.*(?P<quality>\b(?:green|amber|red)\b))?(?=.*(?P<feedback>\b(?:positive|negative)\b))?.*
正则表达式可以分解如下
^ # match beginning of line
(?= # begin positive lookahead
.* # match zero or more characters
(?P<gender> # begin named capture group 'gender'
\b # match a word boundary
(?:female|male) # one of the two words
\b # match a word boundary
) # end capture group 'gender'
)? # end positive lookahead and make it optional
(?= # begin positive lookahead
.* # match zero or more characters
(?P<quality> # begin named capture group 'quality'
\b # match a word boundary
(?:green|amber|red) # match one of the three words
\b # match a word boundary
) # end named capture group 'quality'
)? # end positive lookahead and make it optional
(?= # begin positive lookahead
.* # match zero or more characters
(?P<feedback> # begin named capture group 'feedback'
\b # match a word boundary
(?:positive|negative) # match one of the two words
\b # match a word boundary
) # end named capture group 'feedback'
)? # end positive lookahead and make it
.* # match zero or more characters (the line)
找了半天没找到满足我要求的正则表达式
我有多行这样的文字:
male positive average
average negative female
good negative female
female bad
male average
male
female
...
...
在上面的例子中有三组词(male|female), (good|average|bad), and (positive|negative)
我想捕获命名组中的每组词:分别是性别、质量和反馈。
我最接近的是:
(?=.*(?P<gender>\b(fe)?male\b))(?=.*(?P<quality>(green|amber|red)))(?=.*(?P<feedback>(positive|negative))).*
哪个匹配组:性别、质量和反馈,顺序不限。
但不匹配or/and为以下类型的句子创建命名组:
female green
positive male
positive female
female bad
male average
male
female
注意:性别(男|女)很常见,每一行都会出现。此外,为简单起见,这里只提到了三个不同的组。根据需求,它甚至可以增长更多。
任何帮助将不胜感激。
您需要将您的正则表达式锚定到一行的开头 (^
) 并使每个包含命名捕获组的正前瞻 可选.
另外,您有一些编号的捕获组,它们可能是 non-capture 组,这不会造成混淆,因为您只对命名的捕获组感兴趣。最后,你漏掉了一些单词边界。
我建议你把你的表达改成下面这样。
^(?=.*(?P<gender>\b(?:fe)?male\b))?(?=.*(?P<quality>\b(?:green|amber|red)\b))?(?=.*(?P<feedback>\b(?:positive|negative)\b))?.*
正则表达式可以分解如下
^ # match beginning of line
(?= # begin positive lookahead
.* # match zero or more characters
(?P<gender> # begin named capture group 'gender'
\b # match a word boundary
(?:female|male) # one of the two words
\b # match a word boundary
) # end capture group 'gender'
)? # end positive lookahead and make it optional
(?= # begin positive lookahead
.* # match zero or more characters
(?P<quality> # begin named capture group 'quality'
\b # match a word boundary
(?:green|amber|red) # match one of the three words
\b # match a word boundary
) # end named capture group 'quality'
)? # end positive lookahead and make it optional
(?= # begin positive lookahead
.* # match zero or more characters
(?P<feedback> # begin named capture group 'feedback'
\b # match a word boundary
(?:positive|negative) # match one of the two words
\b # match a word boundary
) # end named capture group 'feedback'
)? # end positive lookahead and make it
.* # match zero or more characters (the line)