AHK 中的 RegEx 非常慢，但在 Notepad++ 中速度很快

Question

我想在网页中查找某个字符串。我决定使用正则表达式。（我知道我的 RegExes 很糟糕，但是它们可以工作）。我的两个表达式在 Notepad++（可能 < 1s）和 Regex101 中使用时速度非常快，但在 AutoHotKey 中使用时速度非常慢——大约 2-5 分钟。我该如何解决这个问题？

sWindowInfo2 = http://www.archiwum.wyborcza.pl/Archiwum/1,0,4583161,20060208LU-DLO,Dzis_bedzie_Piast,.html

whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
whr.Open("GET", sWindowInfo2, false ), whr.Send()
whr.ResponseText
sPage := ""
sPage := whr.ResponseText
; get city name (if exists) – the following is very slooooow
if RegExMatch(sPage, "[\s\S]+<dzial>Gazeta\s(.+)<\/dzial>[\s\S]+")
{
    sCity := RegExReplace(sPage, "[\s\S]+<dzial>Gazeta\s(.+)<\/dzial>[\s\S]+", "")
    ;MsgBox, % sCity
    city := 1
}
if RegExMatch(sPage, "[\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+")
{
    sCity := RegExReplace(sPage, "[\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+", "")
    city := 1
}

编辑： 在我提供的页面中，匹配项是 Lublin。看看：https://regex101.com/r/qJ2pF8/1

Answer 1

您不需要使用RegExReplace来获取捕获的值。根据参考资料，您可以将第三个变量传递给 RegExMatch:

OutputVar OutputVar is the unquoted name of a variable in which to store a match object, which can be used to retrieve the position, length and value of the overall match and of each captured subpattern, if any are present.

所以，使用更简单的模式：

FoundPos := RegExMatch(sPage, "<metryczka>GW\s(.+)\snr", SubPat)  ;

它将return匹配的位置，并将"Lublin"存储在SubPat[1]中。

使用此模式，您可以避免使用 [\s\S]+<metryczka>GW\s(.+)\snr[\s\S]+ 时的大量回溯，因为第一个 [\s\S]+ 匹配到字符串的末尾，然后回溯以适应后续子模式。字符串越长，运行越慢

AHK 中的 RegEx 非常慢，但在 Notepad++ 中速度很快

Very slow RegEx in AHK yet fast in Notepad++

regex

autohotkey