VBA WebScraping return 空值
VBA WebScraping return empty values
我有以下代码从网站上抓取数据,问题是它没有抓取任何数据,它没有显示任何错误,但也没有给我任何结果...
Option Explicit
Public Sub Loiça()
Sheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
Dim IE As New InternetExplorer, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
With IE
.Visible = False
.Navigate2 "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim numResults As Long, arr() As String
arr = Split(.document.querySelector(".status.cb").innerText, Chr$(32))
numResults = arr(LBound(arr))
Dim resultsPerPage As Long
resultsPerPage = .document.querySelectorAll(".data cb").Length
If i > 1 Then
.Navigate2 ("https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/")
While .Busy Or .readyState < 4: DoEvents: Wend
End If
Set data = .document.getElementsByClassName("data cb")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
.Quit
End With
'---------------------------------------------------------------------------'
End Sub
这是一个有趣的挑战。一些注意事项:
- 该页面似乎无法在 Internet Explorer 中加载(至少对我而言)- 这可能是因为不支持旧版浏览器。因此,切换到 Selenium basic and Chrome was needed. After downloading selenium basic and installing, you may have to swop the
ChromeDriver.exe
in the selenium folder for the latest。然后您需要转到 VBE > 工具 > 参考 > 添加对 Selenium Type Library
. 的引用
- 页面以 ajax 动态加载,每批 12 条记录。您需要滚动页面直到显示所有结果。
- 您无法以与显示相同的方式检索结果计数,因为返回的字符串格式不同,而且可能是可变的。相反,您可以从存储此数字的元素中获取总数。
- 要继续使用现有语法编写,您需要将页面 html 传输到 HTMLDocument 变量中,然后使用它。
VBA:
Option Explicit
Public Sub Loiça()
Dim d As WebDriver, t As Date, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
Dim numResults As Long, html As HTMLDocument
Const MAX_WAIT_SEC As Long = 600
Const URL As String = "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"
Set d = New ChromeDriver
With d
.Start "Chrome"
.get URL
Worksheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
numResults = .FindElementByCss("#total").Text
t = Timer
Do
.ExecuteScript "window.scrollBy(0, window.innerHeight);", "javascript"
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop Until .FindElementByCss("#products").Text = numResults
Set html = New HTMLDocument
html.body.innerHTML = .PageSource
Set data = html.getElementsByClassName("data")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
.Quit
End With
End Sub
我有以下代码从网站上抓取数据,问题是它没有抓取任何数据,它没有显示任何错误,但也没有给我任何结果...
Option Explicit
Public Sub Loiça()
Sheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
Dim IE As New InternetExplorer, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
With IE
.Visible = False
.Navigate2 "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"
While .Busy Or .readyState < 4: DoEvents: Wend
Dim numResults As Long, arr() As String
arr = Split(.document.querySelector(".status.cb").innerText, Chr$(32))
numResults = arr(LBound(arr))
Dim resultsPerPage As Long
resultsPerPage = .document.querySelectorAll(".data cb").Length
If i > 1 Then
.Navigate2 ("https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/")
While .Busy Or .readyState < 4: DoEvents: Wend
End If
Set data = .document.getElementsByClassName("data cb")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
.Quit
End With
'---------------------------------------------------------------------------'
End Sub
这是一个有趣的挑战。一些注意事项:
- 该页面似乎无法在 Internet Explorer 中加载(至少对我而言)- 这可能是因为不支持旧版浏览器。因此,切换到 Selenium basic and Chrome was needed. After downloading selenium basic and installing, you may have to swop the
ChromeDriver.exe
in the selenium folder for the latest。然后您需要转到 VBE > 工具 > 参考 > 添加对Selenium Type Library
. 的引用
- 页面以 ajax 动态加载,每批 12 条记录。您需要滚动页面直到显示所有结果。
- 您无法以与显示相同的方式检索结果计数,因为返回的字符串格式不同,而且可能是可变的。相反,您可以从存储此数字的元素中获取总数。
- 要继续使用现有语法编写,您需要将页面 html 传输到 HTMLDocument 变量中,然后使用它。
VBA:
Option Explicit
Public Sub Loiça()
Dim d As WebDriver, t As Date, i As Long, data As Object, div As Object, item As Object, r As Long, c As Long
Dim numResults As Long, html As HTMLDocument
Const MAX_WAIT_SEC As Long = 600
Const URL As String = "https://www.radiopopular.pt/categoria/maquina-de-lavar-louca/"
Set d = New ChromeDriver
With d
.Start "Chrome"
.get URL
Worksheets("Loiça").Range("A:A,Z:Z").EntireColumn.Delete
numResults = .FindElementByCss("#total").Text
t = Timer
Do
.ExecuteScript "window.scrollBy(0, window.innerHeight);", "javascript"
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop Until .FindElementByCss("#products").Text = numResults
Set html = New HTMLDocument
html.body.innerHTML = .PageSource
Set data = html.getElementsByClassName("data")
For Each item In data
r = r + 1: c = 1
For Each div In item.getElementsByTagName("div")
With ThisWorkbook.Worksheets("Loiça")
.Cells(r, c) = div.innerText
End With
c = c + 1
Next
Next
.Quit
End With
End Sub