尝试从多个 URL 导入数据,但总是卡在第一个 URL
Trying to Import Data from Several URLs but Always Stuck on the First URL
我想我走错了路,现在我真的偏离了正轨。我正在尝试遍历 93 URLs 并从每个中导入数据。这是我正在测试的代码。
Sub Web_Table()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
Dim sht As Worksheet
Dim LastRow As Long
Set sht = ActiveSheet
lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For i = 1 To 93
objIE.Navigate "https://etfdb.com/screener/#tab=returns&page=" & i
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
'Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
Debug.Print i
lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Next i
objIE.Quit
End Sub
我期待第一眼看到这个 URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
SPY SPDR S&P 500 ETF -2.44% 7.19% -2.19% 10.36% 36.10% 65.39%
IVV iShares Core S&P 500 ETF -2.46% 7.20% -2.22% 10.48% 36.60% 65.41%
VTI Vanguard Total Stock Market ETF -2.45% 7.88% -2.58% 9.38% 33.94% 60.89%
etc.
然后第二个URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
VGT Vanguard Information Technology ETF 0.15% 8.00% 11.98% 34.51% 98.99% 168.56%
XLK Technology Select Sector SPDR Fund 0.11% 7.44% 12.41% 36.64% 92.76% 161.90%
etc.
接下来是第三个 URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
IXUS iShares Core MSCI Total International Stock ETF -2.42% 9.27% -10.66% -0.97% 4.29% 12.08%
SCHF Schwab International Equity ETF -2.48% 9.96% -10.05% -0.66% 4.24% 11.43%
etc.
由于某些奇怪的原因,它似乎卡在第一个 URL 上,而它永远不会进入第二个、第三个等。我有 objIE.Navigate
和 Do Until objIE.ReadyState = 4 And Not objIE.Busy
.我在这里没有看到什么?
您需要留出足够的时间让页面完全加载。因此,在页面上寻找仅在加载所有所需数据时才出现的内容,然后对其进行测试。例如,我经常测试结果中的行数 table。
就个人而言,如果允许抓取,我会切换到 xmlhttp 以避免时间问题。您可以更改 per_page 并以更少的请求获得更多结果。这是一个例子:
您将需要一个 json 解析器,例如 jsonconverter.bas,以根据需要将返回的 json 解析为 table。
Option Explicit
Public Sub GetData()
Dim xhr As MSXML2.xmlhttp60, html As MSHTML.HTMLDocument, body As String
'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ; Microsoft XML, v6 (your version may vary)
Set xhr = New MSXML2.xmlhttp60
Set html = New MSHTML.HTMLDocument
body = "{""tab"":""returns"",""page"":pageNumber,""per_page"":1000}"
With xhr
Dim page As Long
For page = 1 To 2
.Open "POST", "https://etfdb.com/api/screener/", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send Replace$(body, "pageNumber", page)
'debug.Print .Status
ActiveSheet.Cells(page, 1) = .responseText
Next
End With
End Sub
我试图在我这边测试你的代码,我发现它没有任何问题。它正在为所有 93 个 URL 生成数据。
我在 Windows 10 64 位 OS 上用 Excel 2016 制作了这个。
您可以在控制台看到您打印的'i'变量值。
我已经看到 VBA IE 自动化代码卡在 ReadyState 行的这种问题。
为了确认这个问题,您可以尝试在您的代码中放置一个断点,您会注意到代码将不断循环 ReadyState 并且它永远不会结束。
如果您尝试 运行 在任何其他机器上使用相同的代码,您会发现该代码可以正常工作。例如,代码对我有用。
然后您也可以尝试使用以下代码示例进行测试,看看是否适合您。
With objIE
Do While .Busy Or .ReadyState <> 4: DoEvents: Loop
End With
'----------------OR---------------------------------------
Do While objIE.Busy = True
Loop
'----------------OR---------------------------------------
Do While objIE.readyState = 4: DoEvents: Loop
Do Until objIE.readyState = 4: DoEvents: Loop
While objIE.Busy
DoEvents
Wend
如果没有任何效果,请尝试再测试 Application.wait 几秒钟。
Application.Wait (Now + TimeValue("0:00:05"))
您也可以使用睡眠功能进行测试。
#If VBA7 Then ' Excel 2010 or later
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
#Else ' Excel 2007 or earlier
Public Declare Sub Sleep Lib "kernel32" (ByVal Milliseconds As Long)
Sub demo()
For i = 0 To 10
DoEvents
Sleep 1000
Debug.Print i
Next i
End Sub
有用的线程参考:
internetexplorer.application hangs on readystate=1 in VBA
我想我走错了路,现在我真的偏离了正轨。我正在尝试遍历 93 URLs 并从每个中导入数据。这是我正在测试的代码。
Sub Web_Table()
Dim HTMLDoc As New HTMLDocument
Dim objTable As Object
Dim lRow As Long
Dim lngTable As Long
Dim lngRow As Long
Dim lngCol As Long
Dim ActRw As Long
Dim objIE As InternetExplorer
Set objIE = New InternetExplorer
Dim sht As Worksheet
Dim LastRow As Long
Set sht = ActiveSheet
lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
For i = 1 To 93
objIE.Navigate "https://etfdb.com/screener/#tab=returns&page=" & i
Do Until objIE.ReadyState = 4 And Not objIE.Busy
DoEvents
Loop
'Application.Wait (Now + TimeValue("0:00:03")) 'wait for java script to load
HTMLDoc.body.innerHTML = objIE.Document.body.innerHTML
With HTMLDoc.body
Set objTable = .getElementsByTagName("table")
For lngTable = 0 To objTable.Length - 1
For lngRow = 0 To objTable(lngTable).Rows.Length - 1
For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length - 1
ThisWorkbook.Sheets("Sheet1").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
Next lngCol
Next lngRow
ActRw = ActRw + objTable(lngTable).Rows.Length + 1
Next lngTable
End With
Debug.Print i
lngRow = sht.Cells(sht.Rows.Count, "A").End(xlUp).Row
Next i
objIE.Quit
End Sub
我期待第一眼看到这个 URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
SPY SPDR S&P 500 ETF -2.44% 7.19% -2.19% 10.36% 36.10% 65.39%
IVV iShares Core S&P 500 ETF -2.46% 7.20% -2.22% 10.48% 36.60% 65.41%
VTI Vanguard Total Stock Market ETF -2.45% 7.88% -2.58% 9.38% 33.94% 60.89%
etc.
然后第二个URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
VGT Vanguard Information Technology ETF 0.15% 8.00% 11.98% 34.51% 98.99% 168.56%
XLK Technology Select Sector SPDR Fund 0.11% 7.44% 12.41% 36.64% 92.76% 161.90%
etc.
接下来是第三个 URL:
Symbol ETF Name 1 Week 4 Week YTD 1 Year 3 Year 5 Year Returns Rating
IXUS iShares Core MSCI Total International Stock ETF -2.42% 9.27% -10.66% -0.97% 4.29% 12.08%
SCHF Schwab International Equity ETF -2.48% 9.96% -10.05% -0.66% 4.24% 11.43%
etc.
由于某些奇怪的原因,它似乎卡在第一个 URL 上,而它永远不会进入第二个、第三个等。我有 objIE.Navigate
和 Do Until objIE.ReadyState = 4 And Not objIE.Busy
.我在这里没有看到什么?
您需要留出足够的时间让页面完全加载。因此,在页面上寻找仅在加载所有所需数据时才出现的内容,然后对其进行测试。例如,我经常测试结果中的行数 table。
就个人而言,如果允许抓取,我会切换到 xmlhttp 以避免时间问题。您可以更改 per_page 并以更少的请求获得更多结果。这是一个例子:
您将需要一个 json 解析器,例如 jsonconverter.bas,以根据需要将返回的 json 解析为 table。
Option Explicit
Public Sub GetData()
Dim xhr As MSXML2.xmlhttp60, html As MSHTML.HTMLDocument, body As String
'required VBE (Alt+F11) > Tools > References > Microsoft HTML Object Library ; Microsoft XML, v6 (your version may vary)
Set xhr = New MSXML2.xmlhttp60
Set html = New MSHTML.HTMLDocument
body = "{""tab"":""returns"",""page"":pageNumber,""per_page"":1000}"
With xhr
Dim page As Long
For page = 1 To 2
.Open "POST", "https://etfdb.com/api/screener/", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send Replace$(body, "pageNumber", page)
'debug.Print .Status
ActiveSheet.Cells(page, 1) = .responseText
Next
End With
End Sub
我试图在我这边测试你的代码,我发现它没有任何问题。它正在为所有 93 个 URL 生成数据。
我在 Windows 10 64 位 OS 上用 Excel 2016 制作了这个。
您可以在控制台看到您打印的'i'变量值。
我已经看到 VBA IE 自动化代码卡在 ReadyState 行的这种问题。
为了确认这个问题,您可以尝试在您的代码中放置一个断点,您会注意到代码将不断循环 ReadyState 并且它永远不会结束。
如果您尝试 运行 在任何其他机器上使用相同的代码,您会发现该代码可以正常工作。例如,代码对我有用。
然后您也可以尝试使用以下代码示例进行测试,看看是否适合您。
With objIE
Do While .Busy Or .ReadyState <> 4: DoEvents: Loop
End With
'----------------OR---------------------------------------
Do While objIE.Busy = True
Loop
'----------------OR---------------------------------------
Do While objIE.readyState = 4: DoEvents: Loop
Do Until objIE.readyState = 4: DoEvents: Loop
While objIE.Busy
DoEvents
Wend
如果没有任何效果,请尝试再测试 Application.wait 几秒钟。
Application.Wait (Now + TimeValue("0:00:05"))
您也可以使用睡眠功能进行测试。
#If VBA7 Then ' Excel 2010 or later
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
#Else ' Excel 2007 or earlier
Public Declare Sub Sleep Lib "kernel32" (ByVal Milliseconds As Long)
Sub demo()
For i = 0 To 10
DoEvents
Sleep 1000
Debug.Print i
Next i
End Sub
有用的线程参考:
internetexplorer.application hangs on readystate=1 in VBA