使用 re.search() 理解 *（零个或多个）运算符

Question

我是 python 的新手，正在学习 "Google for Education" python 课程

现在，下面这行让我很困惑：

* -- 0 or more occurrences of the pattern to its left

(所有例子都在python3)

例如1

In [1]: re.search(r"pi*", "piiig!!").group()
Out[1]: 'piii'

这很好，因为 "pi" 出现了 1 次，所以它被退回了

例如2

In [2]: re.search(r"i*", "piiig!!").group()
Out[2]: ''

为什么它实际上不是 return "i" - 根据我的理解，它应该是 returning "iii"。但结果是一个空字符串。

另外，“0 个或更多”到底是什么意思？我在 google 上搜索，但到处都提到 * -- 0 或更多。但是，如果一个表达式的出现次数为 0，即使它不存在，它也不会变为真吗？那搜索的意义何在？

我对此很困惑。你能帮我解释一下或者给我指明正确的方向吗？

我希望正确的解释也能解决我的这个问题：

In [3]: re.search(r"i?", "piiig!!").group()
Out[3]: ''

我试过Spyder 3.2.4

中的例子

Answer 1

您需要正确使用 *(0 或更多) 和 +(1 或更多) 以获得您想要的输出

例如：1 匹配，因为您只为 "i" 定义了 *，此模式将捕获所有 "p" 或 "pi" 组合

例如：2 如果您只需要匹配 "i" 您需要使用“+”而不是“*”。

如果您使用“*”

In: re.search(r"pi*g", "piiig!!").group()

这将 return 如果您输入的是 ("pig" 或 "piig" 或 "pg")

如果您使用“+”

In: re.search(r"pi+g", "piiig!!").group()

如果您输入的是（"pig" 或 "piig"）

，这将 return

Answer 2

特殊字符*表示前面的字符出现0次或多次。例如。 a* 匹配 0 次或多次出现的 a，可以是 ''、'a'、'aa' 等。发生这种情况是因为 '' 出现 0 次 a. 要获得 iii，您应该使用 + 而不是 *，因此会得到 'i' 的第一个非零序列，即 iii

re.search("i+", "piiig!!").group()

Answer 3

因为''是r'i*'的第一个匹配结果，'iii'是第二个匹配的结果

In [1]: import re

In [2]: re.findall(r'i*', 'piiig!!')
Out[2]: ['', 'iii', '', '', '', '']

本网站还将解释正则表达式的工作方式。 https://regex101.com/r/XVPXMv/1

Answer 4

解释比我们目前看到的答案要复杂一些。

首先，与 re.match() 不同，原始操作 re.search() 检查字符串中任何位置的匹配（这是 Perl 默认情况下所做的）并找到模式 once:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding MatchObject instance. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. See: Ref.

如果我们在正则表达式引擎尝试查找匹配项时跟踪它的每一步，我们可以观察到模式 i* 和测试字符串 piigg!! 的以下内容：

如您所见，第一个字符（在位置 0）产生匹配，因为 p 是零次 i 并且结果是一个空匹配（而不是 p - 因为我们这样做不搜索 p 或任何其他字符）。
在第二个字符（位置 1）找到第二个匹配项（跨越到位置 2），因为 ii 是零次或多次 i...在位置 3 有另一个空匹配项，到目前为止等等。

因为 re.search 只有 returns 第一场比赛它与位置 0 的第一场空比赛保持一致。这就是为什么你得到你发布的（令人困惑的）结果：

In [2]: re.search(r"i*", "piiig!!").group()
Out[2]: ''

为了匹配每个匹配项，您需要 re.findall():

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match. See: Ref.

使用 re.search() 理解 *（零个或多个）运算符

Understanding * (zero or more) operator using re.search()

python

regex

regex-greedy