正则表达式中的圆括号是什么意思？

Question

我不明白为什么正则表达式 ^(.)+$ 匹配字符串的最后一个字母。我以为它会匹配整个字符串。

Python中的示例：

>>> text = 'This is a sentence'
>>> re.findall('^(.)+$', text)
['e']

Answer 1

如果有一个（或多个）捕获组，re.findall returns 不同：

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

并且根据MatchObject.group documentation：

If a group matches multiple times, only the last match is accessible:

如果要获取整个字符串，请使用非捕获组：

>>> re.findall('^(?:.)+$', text)
['This is a sentence']

或者根本不使用捕获组：

>>> re.findall('^.+$', text)
['This is a sentence']

或将组更改为捕获所有：

>>> re.findall('^(.+)$', text)
['This is a sentence']
>>> re.findall('(^.+$)', text)
['This is a sentence']

或者，你可以使用re.finditer which yield match objects. Using MatchObject.group()，你可以得到整个匹配的字符串：

>>> [m.group() for m in re.finditer('^(.)+$', text)]
['This is a sentence']

Answer 2

因为捕获组只有一个字符(.)。由于 + 量词，正则表达式引擎将继续匹配整个字符串，并且每次都会将捕获组更新为最新匹配。最后，捕获组将是最后一个字符。

即使您使用 findall，第一次应用正则表达式时，由于 + 量词，它将继续匹配整个字符串，直到结束。由于到达字符串末尾，正则表达式不会再次应用，调用 return 只是一个结果。

如果删除 + 量词，那么第一次，正则表达式将只匹配一个字符，因此正则表达式将被一次又一次地应用，直到整个字符串被消耗，并且 findall 将 return 字符串中所有字符的列表。

Answer 3

注意 + 默认情况下是贪心的，它匹配所有字符直到最后一个。由于捕获组中只有点，所以上面的正则表达式从头开始匹配所有字符，但只捕获最后一个字符。由于 findall 函数优先考虑组，因此它只打印出组中存在的字符。

re.findall('^(.+)$', text)

正则表达式中的圆括号是什么意思？

What do round brackets in Regex mean?

python

regex

capture-group