从 HTML 源中提取值

Extract value from HTML Source

我有一个宏用于访问网站,从代码的特定部分从 A 列中提取一个值,例如 517167000,并将该值返回到一个单元格。 html 源现在已经改变,我似乎无法让它工作。

我的原始代码是

Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")

With request
    .Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
    .send
    UnitPerBox = Trim(Split(Split(.responseText, "Units per box</td>")(1), "<tr")(0))
End With

End Function

网站的一个工作示例是

https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-517167000

以便您可以访问该网站并查看源代码。 新的 html 代码如下所示,但自从我编写原始宏以来已经很久了,所以我认为我可以更改

"Units per box</td>")(1), "<tr" 

"Units per pack</td> <td class="value">")(1), "<tr"

下面的新 html 代码是现在网站上的代码,例如我需要值 2.74,但它不起作用。

<tr>
                <td class="name">Units per pack</td>
                <td class="value">2.74</td>
            </tr>

如有任何帮助,我们将不胜感激。

一个例子 干杯

如果您使用 Split() 使用 .responseText 进行文本操作,您不妨使用正则表达式而不设置它的 Global 参数:

Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")

Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = "\d+(?:\.\d+)?"

With request
    .Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
    .send
    UnitPerBox = RegEx.Execute(Split(.responsetext, "Units per pack</td>")(1))(0)
End With

End Function

然而,

Neater (IMO) 是为了避免对 .responseText 进行文本操作并处理 HTML 文档,从 HTML-[=29 检索适当的数据=] 按元素 ID 和 table 索引:

Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")

With request
    .Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
    .send
    htmlResponse.body.innerHTML = .responseText
    UnitPerBox = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")(10).getElementsByTagName("td")(1).innerText
End With

End Function

请注意,table 是从 0 开始索引的,这意味着我们实际上是从第 11 行第二列检索我们的值。如果您不确定 table 内容是否总是在相同的索引上找到,您也可以只循环子节点:

Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
Dim Rws As Object

With request
    .Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
    .send
    htmlResponse.body.innerHTML = .responseText
    Set Rws = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")
    For Each Rw In Rws
        If Rw.getElementsByTagName("td")(0).InnerText = "Units per pack" Then
            UnitPerBox = Rw.getElementsByTagName("td")(1).InnerText
            Exit For
        End If
    Next
End With

End Function

我个人更喜欢使用 HTML 文档而不是文本操作,以上所有选项都可以检索您的值 =)