如何避免在 RegEx 拆分结果中捕获组？

Question

我正在尝试使用 re 来匹配以 '\n' 开头的模式，后跟一个可能的 'real(r8)'，然后是零个或多个白色空格，然后是单词 'function'，然后我想在匹配发生的地方拆分字符串。所以对于这个字符串，

text = '''functional \n   function disdat \nkitkat function wakawak\nreal(r8) function noooooo \ndoit'''

我愿意：

['functional ',
 ' disdat \nkitkat function wakawak',
 ' noooooo \ndoit']

然而，

regex = re.compile(r'''\n(real\(r8\))?\s*\bfunction\b''')

regex.split(text)

returns

['functional ',
 None,
 ' disdat \nkitkat function wakawak',
 'real(r8)',
 ' noooooo \ndoit']

split returns 匹配的组也是。我怎么要求它不要？

Answer 1

你可以像这样使用非捕获组

>>> regex = re.compile(r'\n(?:real\(r8\))?\s*\bfunction\b')
>>> regex.split(text)
['functional ', ' disdat \nkitkat function wakawak', ' noooooo \ndoit']

在 (?:real\(r8\)) 中注明 ?:。引用 Python documentation for (?:..)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group cannot be retrieved after performing a match or referenced later in the pattern.

如何避免在 RegEx 拆分结果中捕获组？

How to avoid capturing groups in RegEx splitting result?

python

regex

string

split