如何使用 VBA 一次抓取多个 pages/links?

How can I scrape multiple pages/links at once using VBA?

我目前正在尝试从 this Reddit Page 抓取信息。我的目标是让 excel 在新选项卡中打开所有帖子,然后我想从每个页面中抓取信息,因为起始页面没有那么多信息。

过去几个小时我一直在努力解决这个问题,但我承认我对如何去做很困惑,只是总体上不确定下一步该做什么,所以任何指示将不胜感激!

这是我当前的代码,它工作得很好,但正如我所说,我不确定下一步应该做什么来打开它找到的链接并逐页抓取数据。 第一页上的链接已被删除,然后立即添加到我的电子表格中,但如果可能的话,我想跳过这一步并一次将它们全部删除。

谢谢! :)

Sub GetData()

Dim objIE As InternetExplorer
Dim itemEle As Object
Dim upvote As Integer, awards As Integer, animated As Integer
Dim postdate As String, upvotepercent As String, oc As String, filetype As String, linkurl As String, myhtmldata As String, visiComments As String, totalComments As String, removedComments As String
Dim y As Integer

Set objIE = New InternetExplorer
objIE.Visible = False

objIE.navigate (ActiveCell.Value)
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

y = 1

For Each itemEle In objIE.document.getElementsByClassName("flat-list buttons")
visiComments = itemEle.getElementsByTagName("a")(0).innerText
linkurl = itemEle.getElementsByTagName("a")(0).href
Sheets("Sheet1").Range("A" & y).Value = visiComments
Sheets("Sheet1").Range("B" & y).Value = linkurl
y = y + 1
Next

End Sub

您应该能够收集 url,然后在循环中访问并将访问页面的结果写入数组,然后数组写入 sheet。在现有行

之后添加
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop

添加:

Dim nodeList As Object , i As Long, urls(), results()

注意:由于 VBA 是单线程的,因此您可能只会增加页面加载量。为此,您需要存储对每个选项卡的引用,或者先打开所有选项卡,然后循环遍历相关的打开 windows 来进行抓取。老实说,我的偏好是保持在同一个标​​签中。

Set nodeList = ie.document.querySelectorAll(".comments")
Redim urls(0 To nodeList.Length-1)
Redim results(1 to nodeList.Length, 1 to 3)
'Store all urls in an array to later loop
For i = 0 To nodeList.Length -1 
    urls(i) = nodeList.item(i).href
Next

For i = LBound(urls) To UBound(urls)
    ie.Navigate2   urls(i)
    While ie.Busy Or ie.Readystate <> 4: DoEvents:Wend
    'may need a pause here
    results(i + 1, 1) = ie.document.querySelector("a.title").innerText 'title
    results(i + 1, 2) = ie.document.querySelector(".number").innerText 'upvotes
    results(i + 1, 3) = ie.document.querySelector(".word").NextSibling.nodeValue '%
Next
ActiveSheet.Cells(1,1).Resize(UBound(results,1) , UBound(results,2)) = results