当搜索到的模式以 t 开头时，我在 img src 属性中查找模式的正则表达式失败

Question

我构建了一个正则表达式来查找 html 字符串中不以 'http'.

开头的任何 href 或 src 属性值

我的解决方案似乎在大多数情况下都有效，但属性值以 't' 开头时除外。我不明白为什么。有人可以解释为什么会这样吗？

示例（在 javascript 中）：

//this gives the expected match
'<img href="somename.jpg">'.match(/(?:href|src)\=\"([^(http)][^(\")]*)\"/);

//this does NOT give the expected match
'<img href="thisname.jpg">'.match(/(?:href|src)\=\"([^(http)][^(\")]*)\"/);

这是我使用的正则表达式：

/(?:href|src)\=\"([^(http)][^(\")]*)\"/

Answer 1

可能 [^(http)] 排除了所有出现的 h、t 和 p。试一下，如果 "psomename.jpg" 也不起作用。

Answer 2

试试这个

<img href="thisname.jpg">'.match(/(?:href|src)\=\"([^(http)]?[^(\")]*)\"/);

Answer 3

您正在寻找先行断言：

/(?:href|src)="(?!https?:\/\/)[^"]+"/

这是一个消极的前瞻。在这种情况下，如果它前面没有 http://（或 https://），它就会匹配您的字符串。一个更简单的例子是 (?!a)b，即 b 前面没有 a。负面回顾也可以 (?<!string) 但我认为 JavaScript 不支持它。

https://www.regular-expressions.info/lookaround.html

Answer 4

[^(http)] 这是你的问题，你基本上说 not h not t and not p with this.

我愿意假设您将 (?!http) 视为否定前瞻组，以消除 URL.

中的所有 http 文字

这应该足够了（简短 n 简单）

(?:href|src)="(?!http:\/\/).*\"

如果您只是想消除以 http 开头的内容，而不是实际检查某些内容是否有效 URL

当搜索到的模式以 t 开头时，我在 img src 属性中查找模式的正则表达式失败

My regex to find a pattern in an img src attribut fails when the searched pattern starts with a t

html

javascript

regex

href

src