正则表达式 return 所有匹配项

Question

我有如下文字-

02052020 02:40:02.445: Vacation Allowance: 21; nnnnnn Vacation Allowance: 22;nnn

我想在 Python-

中提取以下内容

Vacation Allowance: 21
Vacation Allowance: 22

基本上，我想提取所有出现的“Vacation Allowance:”以及后缀为 ;

的数值

我正在使用下面的正则表达式-

(.*)(Vacation Allowance:)(.*);(.*)

下面的完整 Python 代码-

import re

text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'

pattern = re.compile(r'(.*)(Vacation Allowance:)(.*);(.*)')

for (a,b,c,d) in re.findall(pattern, text):
    print(b, " ", c)

这并没有给出所有出现的地方，而是只给出了最后一次出现的地方。当前输出为-

Vacation Allowance: 22

你能评论我如何提取所有出现的地方吗？

Answer 1

在Javascript中会是'text'.match(/\bVacation Allowance: \d+/g)

您需要全局属性g

Answer 2

问题出在使用的正则表达式上。 (.*) 块接受的字符串比你意识到的要多 - .* 被称为贪心操作，它会在匹配的同时尽可能多地消耗字符串。这就是为什么你只看到一个输出。

建议匹配 Vacation Allowance:\s*\d+; 或类似内容。

text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'
m = re.findall('Vacation Allowance:\s*(\d*);', text, re.M)
print(m)

结果：['21', '22']

正则表达式 return 所有匹配项

Regular expression to return all match occurrences

python

regex

regex-group