基于正则表达式的正则表达式分组和特定组的提取向后看

Question

我想提取以下文本中 'ID' 出现后的数字这就是我能够得到它的方式。

import re

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, ID 147894886."

# 'ID' need to be present as mandatory group
regex = '(id)(.*?)(\d+})' 

rg = re.compile(regex ,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
if m:
    print m.group(3)

当我运行下面的代码时，它打印

147894886

问题来了

如果txt变成这样

txt="Recharge done on 28-12-2017 04:57PM,MRP:Rs9.00,GST 18% payable by Company/Distributor/Retailer:Rs1.37, TransID 147894886."

和 "Trans" 单词出现在 "ID" 之前，那么我不想提取数字。如何在正则表达式中执行此操作（即，如果 "TransID" 出现在数字之前，则不提取数字，但只有当 "ID" 出现时，才提取数字）

Answer 1

您可以使用负向回顾 [doc] :

(?<!trans)(id)(.*?)(\d+)

Demo

或者，如 Sebastian Proske 所建议的，您可以使用单词边界：

\b(id)(.*?)(\d+)

Demo

Answer 2

您可以使用单词边界 (\b) 来确保 ID 是一个完整的单词。

\b(id)(.*?)(\d+)

它也可能有助于减少普遍匹配您的模式。如果你总是有 ID 后跟一个 space，然后是 9 个数字，你可以使用这个正则表达式：

\b(id)([ ])(\d{9})

Pythex Demo

基于正则表达式的正则表达式分组和特定组的提取向后看

Regex Grouping and extraction of particular group based on regex look behind

python

regex

regex-negation

python-2.7

regex-lookarounds