尝试从多个 URL 导入数据,但总是卡在第一个 URL

Trying to Import Data from Several URLs but Always Stuck on the First URL

我想我走错了路,现在我真的偏离了正轨。我正在尝试遍历 93 URLs 并从每个中导入数据。这是我正在测试的代码。

Sub Web_Table()
    Dim HTMLDoc As New HTMLDocument
    Dim objTable As Object
    Dim lRow As Long
    Dim lngTable As Long
    Dim lngRow As Long
    Dim lngCol As Long
    Dim ActRw As Long
    Dim objIE As InternetExplorer
    Set objIE = New InternetExplorer
    
    Dim sht As Worksheet
    Dim LastRow As Long
    Set sht = ActiveSheet
    lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row

    For i = 1 To 93
    
        objIE.Navigate "https://etfdb.com/screener/#tab=returns&page=" & i
    
        Do Until objIE.ReadyState = 4 And Not objIE.Busy
            DoEvents
        Loop
        
        'Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
        
        HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
        With HTMLDoc.body
            Set objTable = .getElementsByTagName("table")
            For lngTable = 0 To objTable.Length - 1
                For lngRow = 0 To objTable(lngTable).Rows.Length - 1
                    For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
                        ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
                    Next lngCol
                Next lngRow
                ActRw = ActRw + objTable(lngTable).Rows.Length + 1
            Next lngTable
        End With
        
        Debug.Print i
        lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
        
    Next i
    
    objIE.Quit
    
End Sub

我期待第一眼看到这个 URL:

Symbol  ETF Name    1 Week  4 Week  YTD 1 Year  3 Year  5 Year  Returns Rating
SPY SPDR S&P 500 ETF    -2.44%  7.19%   -2.19%  10.36%  36.10%  65.39%  
IVV iShares Core S&P 500 ETF    -2.46%  7.20%   -2.22%  10.48%  36.60%  65.41%  
VTI Vanguard Total Stock Market ETF -2.45%  7.88%   -2.58%  9.38%   33.94%  60.89%  
etc.

然后第二个URL:

Symbol  ETF Name    1 Week  4 Week  YTD 1 Year  3 Year  5 Year  Returns Rating
VGT Vanguard Information Technology ETF 0.15%   8.00%   11.98%  34.51%  98.99%  168.56% 
XLK Technology Select Sector SPDR Fund  0.11%   7.44%   12.41%  36.64%  92.76%  161.90% 
etc.    

接下来是第三个 URL:

Symbol  ETF Name    1 Week  4 Week  YTD 1 Year  3 Year  5 Year  Returns Rating
IXUS    iShares Core MSCI Total International Stock ETF -2.42%  9.27%   -10.66% -0.97%  4.29%   12.08%  
SCHF    Schwab International Equity ETF -2.48%  9.96%   -10.05% -0.66%  4.24%   11.43%  
etc.    

由于某些奇怪的原因,它似乎卡在第一个 URL 上,而它永远不会进入第二个、第三个等。我有 objIE.NavigateDo Until objIE.ReadyState = 4 And Not objIE.Busy.我在这里没有看到什么?

您需要留出足够的时间让页面完全加载。因此,在页面上寻找仅在加载所有所需数据时才出现的内容,然后对其进行测试。例如,我经常测试结果中的行数 table。

就个人而言,如果允许抓取,我会切换到 xmlhttp 以避免时间问题。您可以更改 per_page 并以更少的请求获得更多结果。这是一个例子:

您将需要一个 json 解析器,例如 jsonconverter.bas,以根据需要将返回的 json 解析为 table。

Option Explicit

Public Sub GetData()
    Dim xhr As MSXML2.xmlhttp60, html As MSHTML.HTMLDocument, body As String
    'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ;  Microsoft XML, v6 (your version may vary)

    Set xhr = New MSXML2.xmlhttp60
    Set html = New MSHTML.HTMLDocument

    body = "{""tab"":""returns"",""page"":pageNumber,""per_page"":1000}"

    With xhr
        Dim page As Long
        For page = 1 To 2
            .Open "POST", "https://etfdb.com/api/screener/", False
            .setRequestHeader "User-Agent", "Mozilla/5.0"
            .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
            .send Replace$(body, "pageNumber", page)
            'debug.Print .Status
            ActiveSheet.Cells(page, 1) = .responseText
        Next
    End With

End Sub

我试图在我这边测试你的代码,我发现它没有任何问题。它正在为所有 93 个 URL 生成数据。

我在 Windows 10 64 位 OS 上用 Excel 2016 制作了这个。

您可以在控制台看到您打印的'i'变量值。

我已经看到 VBA IE 自动化代码卡在 ReadyState 行的这种问题。

为了确认这个问题,您可以尝试在您的代码中放置一个断点,您会注意到代码将不断循环 ReadyState 并且它永远不会结束。

如果您尝试 运行 在任何其他机器上使用相同的代码,您会发现该代码可以正常工作。例如,代码对我有用。

然后您也可以尝试使用以下代码示例进行测试,看看是否适合您。

  With objIE
        Do While .Busy Or .ReadyState <> 4: DoEvents: Loop
    End With

  '----------------OR---------------------------------------

    Do While objIE.Busy = True
    Loop
  '----------------OR---------------------------------------
    Do While objIE.readyState = 4: DoEvents: Loop
    Do Until objIE.readyState = 4: DoEvents: Loop
    While objIE.Busy
         DoEvents
    Wend

如果没有任何效果,请尝试再测试 Application.wait 几秒钟。

Application.Wait (Now + TimeValue("0:00:05"))

您也可以使用睡眠功能进行测试。

#If VBA7 Then ' Excel 2010 or later

    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)

#Else ' Excel 2007 or earlier

    Public Declare Sub Sleep Lib "kernel32" (ByVal Milliseconds As Long)

Sub demo()
   For i = 0 To 10
       DoEvents
           Sleep 1000
           Debug.Print i
   Next i
End Sub

有用的线程参考:

internetexplorer.application hangs on readystate=1 in VBA