我如何从 AHK 中的 HTML 代码中检索一些信息?

How do I retrieve some information from HTML code in AHK?

我想从 HTML 代码中检索一些信息。让我们考虑以下几点:

<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>

我可以使用 document.getElementsByClassName("article-additional-info")[0].innerText.

article-additional-info class 获取所有信息

但是如何从 class 中检索个人信息,如 2011(来自 <strong>Issue Year:</strong> 2011<)? 我想避免使用 RegEx。

编辑:

根据答案,我稍微修改了代码。但是,我无法摆脱一个元素:Language:。这是代码:

html =
(
<body>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
</body>
)

document := ComObjCreate("HTMLfile")
document.write(html)

test := ["Issue Year:", "Issue No:", "Page Range:", "Page Count:"]

try While (x := document.getElementsByTagName("ul")[A_Index-1])
    {
    if (x.className = "article-additional-info")
        {
        count++
        yclass%count% := x.innerHTML
        }
    }

loop, %count%
{
html := yclass%A_Index%
document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
    {
    StringLen, y, % test[A_Index]
    msgbox % [A_Index] . " " . substr(x.parentnode.innerText, y+2)
    }
}
ExitApp

您可以通过 COM Object HTMLFile and parse the resulting Text using StrSplit() 轻松操纵 HTML。以下是使用您提供的 HTML 和 DOM 查询的示例:

html =
(
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
)

document := ComObjCreate("HTMLfile")
document.write(html)
x := document.getElementsByClassName("article-additional-info")[0].innerText
MsgBox % StrSplit(StrSplit(x, "`n", "`r").5, " ").2

编辑:

url := "https://www.ceeol.com/search/article-detail?id=134854"

html := getPage(url) 

document := ComObjCreate("HTMLfile")
document.write(html)
x := document.getElementsByClassName("article-additional-info")[0].innerText
For k, v in StrSplit(x, "`n", "`r") {
    r .= StrSplit(v, ": ").2 "`n"
}

MsgBox % r

getPage(url) {
    whr := ComObjCreate("WinHttp.WinHttpRequest.5.1")
    whr.Open("GET", URL, true)
    whr.Send()
    ; Using 'true' above and the call below allows the script to remain responsive.
    whr.WaitForResponse()
    return whr.ResponseText
}

另一个使用 querySelectorAll 的例子:

wb := ComObjCreate("InternetExplorer.Application")
wb.Visible := True
wb.Navigate("https://www.ceeol.com/search/article-detail?id=134854")
While wb.Busy
    sleep 100

loop, 3
    r.= wb.document.querySelectorAll(".article-additional-info li")[a_Index-1].lastChild.nodeValue "`n"
msgbox % r

wb.quit()
exitapp

试试看:

html =
(
<body>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> XX 2011</li>
   <li><strong>Issue No:</strong> XX 1 (200)</li>
   <li><strong>Page Range:</strong> XX 65-80</li>
   <li><strong>Page Count:</strong> XX 15</li>
   <li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)

test := "Language:"  ;  adjust for the variable you want to return
classno := 1  ;  adjust the number for the correct class instance!

document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("ul")[A_Index-1])
    {
    if (x.className = "article-additional-info")
        yclass%A_Index% := x.innerHTML
    }
html := yclass%classno%

document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
    {
    StringLen, y, test
    if (x.innerText = test)
        msgbox % substr(x.parentnode.innerText, y+2)  ;  returns "Polish"
    }
ExitApp

而且,如果您想遍历所有 class 个实例和所有变量,只需这样做:

html =
(
<body>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> 2011</li>
   <li><strong>Issue No:</strong> 1 (200)</li>
   <li><strong>Page Range:</strong> 65-80</li>
   <li><strong>Page Count:</strong> 15</li>
   <li><strong>Language:</strong> Polish</li>
</ul>
<ul class="ao">
   <li><strong>Issue Year:</strong> zz 2011</li>
   <li><strong>Issue No:</strong> zz 1 (200)</li>
   <li><strong>Page Range:</strong> zz 65-80</li>
   <li><strong>Page Count:</strong> zz 15</li>
   <li><strong>Language:</strong> zz Polish</li>
</ul>
<ul class="article-additional-info">
   <li><strong>Issue Year:</strong> XX 2011</li>
   <li><strong>Issue No:</strong> XX 1 (200)</li>
   <li><strong>Page Range:</strong> XX 65-80</li>
   <li><strong>Page Count:</strong> XX 15</li>
   <li><strong>Language:</strong> XX Polish</li>
</ul>
</body>
)

document := ComObjCreate("HTMLfile")
document.write(html)

; To skip a variable, change it to: "" (as shown, where only first 3 are shown
test := ["Issue Year:", "Issue No:", "Page Range:", "", ""]

try While (x := document.getElementsByTagName("ul")[A_Index-1])
    {
    if (x.className = "article-additional-info")
        {
        count++
        yclass%count% := x.innerHTML
        }
    }

loop, %count%
{
which++
html := yclass%A_Index%

document.Close
document := ComObjCreate("HTMLfile")
document.write(html)

try While (x := document.getElementsByTagName("strong")[A_Index-1])
    {
    StringLen, y, % test[A_Index]
    if (test[A_Index] <> "")
        msgbox % which . ": " . test[A_Index] . " " . substr(x.parentnode.innerText, y+2)
    }
}
ExitApp

其中 substr(x.parentnode.innerText, y+2) 是您要查找的值。

玩得开心!!