负前瞻和非贪婪的正则表达式匹配

Question

源字符串

<html name="abc:///Testers/something.txt" in="abc/Testers/something.txt" loci="123" sap="abcdefgh="/><html name="abc:///needed.txt" src="abc/needed.txt" location="123" sap="rtyghu"/><html name="abc:///Testers/Testers3/Another.txt" in="abc/Testers/Testers3/Another.txt" loci="123" sap="jhkiopjhg"/><html name="abc:///onemore.txt" src="abc/onemore.txt" location="123" sap="dfrtyu"/>

如何匹配从 <html name=" not followed by (needed) or (onemore) and ending with />

开始的部分

所以在这个字符串中应该有两个匹配项是

<html name="abc:///Testers/something.txt" in="abc/Testers/something.txt" loci="123" sap="abcdefgh="/>
<html name="abc:///Testers/Testers3/Another.txt" in="abc/Testers/Testers3/Another.txt" loci="123" sap="jhkiopjhg"/>

我试过了 - <html name=(?!(needed|onemore)).*?"\/>

它不起作用，因为我对非贪婪和负先行的东西感到困惑。

Answer 1

除了对应该放弃遍历的位置施加限制外，您还需要使用重复量词：

<html\s+name="(?![^"]*(?:needed|onemore))[^>]*>

Live demo

Answer 2

这是您的正则表达式的细分 <html name=(?!(needed|onemore)).*?"\/>

<html name=(?!(needed|onemore)).*?"\/>
1) Literal match: <html name=
2) Not followed by: "needed" or "onemore"
3) Lazy grab all: .*?
  Until Literal match: "/>

你需要做的是使用像这样的另一个分组检查每个字符抓取是否需要或需要一个 <html name=(?:(?!(needed|onemore)).)*?"\/>。这将检查 "needed" 或 "onemore" 不是每个字符抓取的下一个。（我还建议使用 [^>] 而不是 .，这样您就不需要惰性量词。）

但是，我建议您使用类似这样的方法进行过滤 <html name=([^>no]|n(?!eeded)|o(?!nemore))*>。正则表达式引擎更容易适应并且工作量更少。

负前瞻和非贪婪的正则表达式匹配

Regex match with negative look ahead and non greedy

regex

pcre