VBA WebScraping return 空值

VBA WebScraping return empty values

我有以下代码从网站上抓取数据,问题是它没有抓取任何数据,它没有显示任何错误,但也没有给我任何结果...

Option Explicit

Public Sub Loiça()
Sheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
    Dim IE As New InternetExplorer, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
    With IE
        .Visible = False
        .Navigate2 "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"

        While .Busy Or .readyState < 4: DoEvents: Wend

        Dim numResults As Long, arr() As String
        arr = Split(.document.querySelector(".status.cb").innerText, Chr$(32))
        numResults = arr(LBound(arr))
        Dim resultsPerPage As Long
        resultsPerPage = .document.querySelectorAll(".data cb").Length
            If i > 1 Then
                .Navigate2 ("https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/")
                While .Busy Or .readyState < 4: DoEvents: Wend
            End If
            Set data = .document.getElementsByClassName("data cb")
            For Each item In data
                r = r + 1: c = 1
                For Each div In item.getElementsByTagName("div")
                    With ThisWorkbook.Worksheets("Loiça")
                        .Cells(r, c) = div.innerText
                    End With
                    c = c + 1
                Next
            Next
        .Quit
    End With
    '---------------------------------------------------------------------------'
End Sub

这是一个有趣的挑战。一些注意事项:

  1. 该页面似乎无法在 Internet Explorer 中加载(至少对我而言)- 这可能是因为不支持旧版浏览器。因此,切换到 Selenium basic and Chrome was needed. After downloading selenium basic and installing, you may have to swop the ChromeDriver.exe in the selenium folder for the latest。然后您需要转到 VBE > 工具 > 参考 > 添加对 Selenium Type Library.
  2. 的引用
  3. 页面以 ajax 动态加载,每批 12 条记录。您需要滚动页面直到显示所有结果。
  4. 您无法以与显示相同的方式检索结果计数,因为返回的字符串格式不同,而且可能是可变的。相反,您可以从存储此数字的元素中获取总数。
  5. 要继续使用现有语法编写,您需要将页面 html 传输到 HTMLDocument 变量中,然后使用它。

VBA:

Option Explicit
Public Sub Loiça()
    Dim d As WebDriver, t As Date, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
    Dim numResults As Long, html As HTMLDocument
    Const MAX_WAIT_SEC As Long = 600
    Const URL As String = "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"

    Set d = New ChromeDriver

    With d
        .Start "Chrome"
        .get URL

        Worksheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
        numResults = .FindElementByCss("#total").Text
        t = Timer
        Do
            .ExecuteScript "window.scrollBy(0, window.innerHeight);", "javascript"
            If Timer - t > MAX_WAIT_SEC Then Exit Do
        Loop Until .FindElementByCss("#products").Text = numResults
        Set html = New HTMLDocument
        html.body.innerHTML = .PageSource
        Set data = html.getElementsByClassName("data")
        For Each item In data
            r = r + 1: c = 1
            For Each div In item.getElementsByTagName("div")
                With ThisWorkbook.Worksheets("Loiça")
                    .Cells(r, c) = div.innerText
                End With
                c = c + 1
            Next
        Next
        .Quit
    End With
End Sub