Excel VBA 提取 href 值

Excel VBA extracting href value

我有一个宏试图从页面中提取所有 href 值,但它似乎只获取第一个。如果有人能帮助我,将不胜感激。

我用的URL是https://www.facebook.com/marketplace/vancouver/entertainment

Screenshot of HTML

<div class="_3-98" data-testid="marketplace_home_feed">
  <div>
    <div>
      <div class="_65db">
          <a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">

    Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
    On Error Resume Next    ' sometimes more web pages are counted than are open
    my_url = objShell.Windows(x).document.Location
    my_title = objShell.Windows(x).document.Title

    If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
        Set ie = objShell.Windows(x)
        marker = 1
        Exit For
    Else
    End If
Next

Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
    Set link = elem.getElementsByTagName("a")(0)
    i = i + 1

     'copy the data to the excel sheet
    ActiveSheet.Cells(i, 4).Value = link.href

Next

End Sub

您只要求每个元素中带有 _3-98 class 的第一个锚元素。遍历父元素内的锚元素集合。

...

dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")

For Each elem In my_data

    for i = 0 to elem.getelementsbytagname("a").count -1

        j = j+1
        ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href

    next i

Next elem 

...

您可以使用 CSS 选择器组合来获取元素。如果您提供实际的 HTML,而不是图像,那么测试和确定最佳组合会更容易。选择器通过 querySelectorAll 方法应用于所有匹配元素的 return a nodeList。您遍历 nodeList.Length 以按索引访问从 0.Length-1 的项目。

VBA:

Dim aNodeList As Object, i As Long
Set aNodeList = ie.document.querySelectorAll("._1oem[href]")
For i = 0 To aNodeList.Length-1
   Activesheet.Cells(i + 2,4) = aNodeList.item(i)
Next 

css选择器组合是._1oem[href],它选择class为_1oem的元素的href属性。 "." 是一个 class 选择器,而 [] 是一个属性选择器。这是一种 且稳健的方法。

以上假定没有要协商的父 form/frame/iframe 标签。

匹配两个属性而不是 class 的替代选择器是:

html.querySelectorAll("[data-testid='marketplace_feed_item'][href]")

完整示例:

Option Explicit
Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.facebook.com/marketplace/vancouver/entertainment"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim aNodeList As Object, i As Long
        Set aNodeList = IE.document.querySelectorAll("._1oem[href]")
        For i = 0 To aNodeList.Length - 1
            ActiveSheet.Cells(i + 2, 4) = aNodeList.item(i)
        Next
        'Quit '<== Remember to quit application
    End With
End Sub