检查“正则表达式”中的两个“后视”条件是否都满足

Question

我正在尝试使用 lookbehind 机制与条件配对来检查用户名前面是否有 RT @ 或 RT@，如 this 中所述教程。正则表达式和示例显示在 Example 1:

示例 1

import re

text = 'RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3'

mt_regex = r'(?i)(?<!RT )&(?<!RT)@(\w+)'

mt_pat = re.compile(mt_regex)

re.findall(mt_pat, text)

输出 [] （空列表），而所需的输出应该是：

['u2', 'u4', 'u3', 'u1']

我错过了什么？提前致谢。

Answer 1

另一个答案显然是正确的并且理所当然地被接受，但我认为这可能对您有用，而且没有负面的后视。好处是您不限于使用 \s*:

的单个 space 字符

(?i)(?:^|[,.])\s*@(\w+)

在线查看demo

(?i) - 区分大小写。请注意，您还可以使用 re.IGNORECASE.
(?:^|[,.]) - 匹配字符串或文字开头的非捕获组 comma/dot.
\s* - 零个或多个 spaces.
@ - 字面上匹配“@”。
(\w+) - 打开捕获组并匹配单词字符，[A-Za-z0-9_].

此打印['u2', 'u4', 'u3', 'u1']

Answer 2

如果我们分解你的正则表达式：

r"(?i)(?<!RT )&(?<!RT)@(\w+)"
(?i)        match the remainder of the pattern, case insensitive match
(?<!RT )    negative lookbehind
            asserts that 'RT ' does not match
&           matches the character '&' literally
(?<!RT)     negative lookbehind 
            asserts that 'RT' does not match
@           matches the character '@' literally
(\w+)       Capturing Group    
            matches [a-zA-Z0-9_] between one and unlimited times

您的 & 字符阻止了您的正则表达式匹配：

import re

text = "RT @u1, @u2, u3, @u4, rt @u5:, @u3.@u1^, rt@u3"
mt_regex = r"(?i)(?<!RT )(?<!RT)@(\w+)"
mt_pat = re.compile(mt_regex)

print(re.findall(mt_pat, text))
# ['u2', 'u4', 'u3', 'u1']

查看此正则表达式 here

检查“正则表达式”中的两个“后视”条件是否都满足

Check that both `lookbehind` conditions are satisfied in `RegEx`

regex

python-3.x

python-re