网页正在抓取 table 个链接
web crawling a table of links
我正在 python 中创建一个脚本,它通过具有三列的 table。我创建了一个列表,其中第一列中的每个 link 都被插入到列表中。然后我循环。循环时,我点击进入link,打印一条语句以确保它确实点击进入link,然后转到上一页以便可以点击下一个link。我不断收到的错误是我的循环首先经过前两个 links,然后当循环第三次调用 links[page].click() 时我得到 StaleElementReferenceException。我不能 post html 因为该站点是机密的。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import traceback
# starting chrome browser
chrome_path = r"C:\Users\guaddavi\Downloads\chromedriver_win32 extract\chromedriver.exe"
browser = webdriver.Chrome(chrome_path)
#linking to page
browser.get('link to page with table ')
#find table of ETL Extracts
table_id = browser.find_element_by_id('sortable_table_id_0')
#print('found table')
#get all the rows of the table containing the links
rows = table_id.find_elements_by_tag_name('tr')
#remove the first row that has the header
del rows[0]
current = 0
links = [] * len(rows)
for row in rows:
col = row.find_elements_by_tag_name('td')[0]
links.append(col)
current +=1
page = 0
while(page <= len(rows)):
links[page].click()
print('clicked link' + " " + str(page))
page += 1
browser.back()
我不确定您是否已经看过官方的 Selenium 文档:
A stale element reference exception is thrown in one of two cases, the first being more common than the second:
The element has been deleted entirely.
The element is no longer attached to the DOM.
对于你的情况,我认为你遇到的是第二个问题。每次您单击并返回循环时,您的 DOM 都会发生变化。请检查一下。
我正在 python 中创建一个脚本,它通过具有三列的 table。我创建了一个列表,其中第一列中的每个 link 都被插入到列表中。然后我循环。循环时,我点击进入link,打印一条语句以确保它确实点击进入link,然后转到上一页以便可以点击下一个link。我不断收到的错误是我的循环首先经过前两个 links,然后当循环第三次调用 links[page].click() 时我得到 StaleElementReferenceException。我不能 post html 因为该站点是机密的。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import traceback
# starting chrome browser
chrome_path = r"C:\Users\guaddavi\Downloads\chromedriver_win32 extract\chromedriver.exe"
browser = webdriver.Chrome(chrome_path)
#linking to page
browser.get('link to page with table ')
#find table of ETL Extracts
table_id = browser.find_element_by_id('sortable_table_id_0')
#print('found table')
#get all the rows of the table containing the links
rows = table_id.find_elements_by_tag_name('tr')
#remove the first row that has the header
del rows[0]
current = 0
links = [] * len(rows)
for row in rows:
col = row.find_elements_by_tag_name('td')[0]
links.append(col)
current +=1
page = 0
while(page <= len(rows)):
links[page].click()
print('clicked link' + " " + str(page))
page += 1
browser.back()
我不确定您是否已经看过官方的 Selenium 文档:
A stale element reference exception is thrown in one of two cases, the first being more common than the second: The element has been deleted entirely. The element is no longer attached to the DOM.
对于你的情况,我认为你遇到的是第二个问题。每次您单击并返回循环时,您的 DOM 都会发生变化。请检查一下。