从 HTML 源中提取值
Extract value from HTML Source
我有一个宏用于访问网站,从代码的特定部分从 A 列中提取一个值,例如 517167000,并将该值返回到一个单元格。
html 源现在已经改变,我似乎无法让它工作。
我的原始代码是
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = Trim(Split(Split(.responseText, "Units per box</td>")(1), "<tr")(0))
End With
End Function
网站的一个工作示例是
https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-517167000
以便您可以访问该网站并查看源代码。
新的 html 代码如下所示,但自从我编写原始宏以来已经很久了,所以我认为我可以更改
"Units per box</td>")(1), "<tr"
到
"Units per pack</td> <td class="value">")(1), "<tr"
下面的新 html 代码是现在网站上的代码,例如我需要值 2.74,但它不起作用。
<tr>
<td class="name">Units per pack</td>
<td class="value">2.74</td>
</tr>
如有任何帮助,我们将不胜感激。
一个例子
干杯
如果您使用 Split()
使用 .responseText
进行文本操作,您不妨使用正则表达式而不设置它的 Global
参数:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = "\d+(?:\.\d+)?"
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = RegEx.Execute(Split(.responsetext, "Units per pack</td>")(1))(0)
End With
End Function
然而,Neater (IMO) 是为了避免对 .responseText
进行文本操作并处理 HTML
文档,从 HTML-[=29 检索适当的数据=] 按元素 ID 和 table 索引:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
UnitPerBox = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")(10).getElementsByTagName("td")(1).innerText
End With
End Function
请注意,table 是从 0 开始索引的,这意味着我们实际上是从第 11 行第二列检索我们的值。如果您不确定 table 内容是否总是在相同的索引上找到,您也可以只循环子节点:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
Dim Rws As Object
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
Set Rws = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")
For Each Rw In Rws
If Rw.getElementsByTagName("td")(0).InnerText = "Units per pack" Then
UnitPerBox = Rw.getElementsByTagName("td")(1).InnerText
Exit For
End If
Next
End With
End Function
我个人更喜欢使用 HTML
文档而不是文本操作,以上所有选项都可以检索您的值 =)
我有一个宏用于访问网站,从代码的特定部分从 A 列中提取一个值,例如 517167000,并将该值返回到一个单元格。 html 源现在已经改变,我似乎无法让它工作。
我的原始代码是
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = Trim(Split(Split(.responseText, "Units per box</td>")(1), "<tr")(0))
End With
End Function
网站的一个工作示例是
https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-517167000
以便您可以访问该网站并查看源代码。 新的 html 代码如下所示,但自从我编写原始宏以来已经很久了,所以我认为我可以更改
"Units per box</td>")(1), "<tr"
到
"Units per pack</td> <td class="value">")(1), "<tr"
下面的新 html 代码是现在网站上的代码,例如我需要值 2.74,但它不起作用。
<tr>
<td class="name">Units per pack</td>
<td class="value">2.74</td>
</tr>
如有任何帮助,我们将不胜感激。
一个例子 干杯
如果您使用 Split()
使用 .responseText
进行文本操作,您不妨使用正则表达式而不设置它的 Global
参数:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim RegEx As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.Pattern = "\d+(?:\.\d+)?"
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
UnitPerBox = RegEx.Execute(Split(.responsetext, "Units per pack</td>")(1))(0)
End With
End Function
然而,
Neater (IMO) 是为了避免对 .responseText
进行文本操作并处理 HTML
文档,从 HTML-[=29 检索适当的数据=] 按元素 ID 和 table 索引:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
UnitPerBox = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")(10).getElementsByTagName("td")(1).innerText
End With
End Function
请注意,table 是从 0 开始索引的,这意味着我们实际上是从第 11 行第二列检索我们的值。如果您不确定 table 内容是否总是在相同的索引上找到,您也可以只循环子节点:
Public Function UnitPerBox(searchTerm As String) As String
Static request As Object
If request Is Nothing Then Set request = CreateObject("msxml2.xmlhttp")
Dim htmlResponse As Object: Set htmlResponse = CreateObject("htmlfile")
Dim Rws As Object
With request
.Open "GET", "https://larsonjuhl.co.uk/mouldings/larson-juhl-essentials/arq-essentials-moulding-" & searchTerm, False
.send
htmlResponse.body.innerHTML = .responseText
Set Rws = htmlResponse.body.document.getElementById("specifications").getElementsByTagName("tr")
For Each Rw In Rws
If Rw.getElementsByTagName("td")(0).InnerText = "Units per pack" Then
UnitPerBox = Rw.getElementsByTagName("td")(1).InnerText
Exit For
End If
Next
End With
End Function
我个人更喜欢使用 HTML
文档而不是文本操作,以上所有选项都可以检索您的值 =)