嵌套“(”如何在 python 正则表达式中工作

Question

我正在尝试提取 (...) 中的值，这些值可能是也可能不是多行的。我在正则表达式中使用 "nested (" 。但它没有按预期工作。为简单起见，我将正则表达式更改如下；

供您参考的代码片段；

RE_MULTI_LINE_PARAMS = ".*"
RE_M_DECLERATION = r"\(%s\)"%RE_MULTI_LINE_PARAMS
...
# read file
fh = open(fname)
fcontent = fh.read()
patternList = re.findall(RE_M_DECLERATION, fcontent, re.VERBOSE)
print patternList

在其他情况下，我使用；

RE_MULTI_LINE_PARAMS = "(.*)"

其余代码同上。但是我在结果列表中看到了差异。

可能有人可以解释一下，"why it behaves?" 或 "how the nested bracket works in regular expression?"

Answer 1

I am trying to extract the values in the (...) , which may or may not be multiline.

如果您希望 .* 能够包含换行符，您需要使用 flags=re.DOTALL。

I see difference in the resultant list.

见findall documentation:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

当您向正则表达式添加一个组时，它 returns 只有该组内的文本。即，不包括括号。当您不包括组时，它 returns 匹配的整个文本，包括括号。

好吧，这听起来不太清楚，所以我会稍微扩展一下。当您在正则表达式中使用 ( 和 ) 时，您是正确的，它不会更改正则表达式将匹配的字符串。括号的作用是标识匹配的部分，将在编号 "group" 中捕获。因此，您的两个示例都会找到相同数量的匹配项。但是，在找到匹配后，findall有不同的行为，具体取决于是否有定义的任何组。如果您只定义了一个组，则返回该组的内容而不是整个匹配项。例如：

>>> import re
>>> a = re.compile(r'\(.*?\)')
>>> b = re.compile(r'\((.*?)\)')
>>> s = 'one (two) three (four) five'
>>> a.findall(s)
['(two)', '(four)']
>>> b.findall(s)
['two', 'four']

两个正则表达式匹配相同的字符串：

>>> [match.group(0) for match in a.finditer(s)]
['(two)', '(four)']
>>> [match.group(0) for match in b.finditer(s)]
['(two)', '(four)']

但是其中一个有一个捕获组，可以选择字符串的一部分：

>>> [match.groups() for match in b.finditer(s)]
[('two',), ('four',)]
>>> [match.groups() for match in a.finditer(s)]
[(), ()]

除了这些问题，你会发现.*尽可能匹配。因此，对于字符串 "one (two) three (four)"，您将不会在 two 和 four 上获得匹配，而是在 two) three (four 上获得匹配。您可以使用 .*? 之类的非贪婪匹配，或者尝试匹配非括号，例如像 [^)]*.

嵌套“(”如何在 python 正则表达式中工作

How Nested "(" work in python regular expressions

python

regex

nested