Excel VBA 提取 href 值
Excel VBA extracting href value
我有一个宏试图从页面中提取所有 href 值,但它似乎只获取第一个。如果有人能帮助我,将不胜感激。
我用的URL是https://www.facebook.com/marketplace/vancouver/entertainment
Screenshot of HTML
<div class="_3-98" data-testid="marketplace_home_feed">
<div>
<div>
<div class="_65db">
<a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">
Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
Set ie = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
Set link = elem.getElementsByTagName("a")(0)
i = i + 1
'copy the data to the excel sheet
ActiveSheet.Cells(i, 4).Value = link.href
Next
End Sub
您只要求每个元素中带有 _3-98 class 的第一个锚元素。遍历父元素内的锚元素集合。
...
dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")
For Each elem In my_data
for i = 0 to elem.getelementsbytagname("a").count -1
j = j+1
ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href
next i
Next elem
...
您可以使用 CSS 选择器组合来获取元素。如果您提供实际的 HTML,而不是图像,那么测试和确定最佳组合会更容易。选择器通过 querySelectorAll
方法应用于所有匹配元素的 return a nodeList
。您遍历 nodeList
的 .Length
以按索引访问从 0
到 .Length-1
的项目。
VBA:
Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length-1
Activesheet.Cells(i + 2,4) = aNodeList.item(i)
Next
css选择器组合是._1oem[href]
,它选择class为_1oem
的元素的href
属性。 "."
是一个 class 选择器,而 []
是一个属性选择器。这是一种 且稳健的方法。
以上假定没有要协商的父 form/frame/iframe
标签。
匹配两个属性而不是 class 的替代选择器是:
html.querySelectorAll("[data-testid='marketplace_feed_item'][href]")
完整示例:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.facebook.com/marketplace/vancouver/entertainment"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i)
Next
'Quit '<== Remember to quit application
End With
End Sub
我有一个宏试图从页面中提取所有 href 值,但它似乎只获取第一个。如果有人能帮助我,将不胜感激。
我用的URL是https://www.facebook.com/marketplace/vancouver/entertainment
Screenshot of HTML
<div class="_3-98" data-testid="marketplace_home_feed">
<div>
<div>
<div class="_65db">
<a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
<a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">
Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(x).document.Location
my_title = objShell.Windows(x).document.Title
If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
Set ie = objShell.Windows(x)
marker = 1
Exit For
Else
End If
Next
Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
Set link = elem.getElementsByTagName("a")(0)
i = i + 1
'copy the data to the excel sheet
ActiveSheet.Cells(i, 4).Value = link.href
Next
End Sub
您只要求每个元素中带有 _3-98 class 的第一个锚元素。遍历父元素内的锚元素集合。
...
dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")
For Each elem In my_data
for i = 0 to elem.getelementsbytagname("a").count -1
j = j+1
ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href
next i
Next elem
...
您可以使用 CSS 选择器组合来获取元素。如果您提供实际的 HTML,而不是图像,那么测试和确定最佳组合会更容易。选择器通过 querySelectorAll
方法应用于所有匹配元素的 return a nodeList
。您遍历 nodeList
的 .Length
以按索引访问从 0
到 .Length-1
的项目。
VBA:
Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length-1
Activesheet.Cells(i + 2,4) = aNodeList.item(i)
Next
css选择器组合是._1oem[href]
,它选择class为_1oem
的元素的href
属性。 "."
是一个 class 选择器,而 []
是一个属性选择器。这是一种
以上假定没有要协商的父 form/frame/iframe
标签。
匹配两个属性而不是 class 的替代选择器是:
html.querySelectorAll("[data-testid='marketplace_feed_item'][href]")
完整示例:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://www.facebook.com/marketplace/vancouver/entertainment"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim aNodeList As Object, i As Long
Set aNodeList = IE.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length - 1
ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i)
Next
'Quit '<== Remember to quit application
End With
End Sub