使用 Selenium 抓取时出现 StaleElementReferenceException 问题
StaleElementReferenceException issue while scraping with Selenium
我正在尝试完整加载此页面:https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1
我设置了一行代码来处理 cookie 弹出窗口。
然后我设置了一些行以单击“加载更多结果”按钮以加载完整的 html 然后打印它。
但是点了一次就报错:
StaleElementReferenceException: stale element reference: element is not attached to the page document
我不知道这是什么意思,也不知道如何解决
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)
time.sleep(10)
wait = WebDriverWait(wd, 10)
# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()
except TimeoutException:
break
time.sleep(10)
print(wd.page_source)
print("Complete")
time.sleep(10)
wd.quit()
StaleElementReferenceException: stale element reference: element is
not attached to the page document
表示对元素的引用现在是“陈旧的”--- 元素不再出现在页面的 DOM 上。这种期望的原因可能是您的 DOM
已更新或刷新。例如,执行 click()
等操作后,您的 DOM
可能会更新或刷新。此时当您尝试在 DOM
上查找元素时,您将遇到此错误。
您必须在更新或刷新中重新找到该元素 DOM
try:
more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()
except StaleElementReferenceException:
more_button = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
more_button.click()
有很多方法可以处理陈旧的元素引用。
一个是尝试在 while 循环中重新点击 web 元素。
你的 link_text 看起来也不对,请使用下面的 xpath :
# click cookies popup
driver.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
more_button = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick,'tagDeClick') and contains(@href,'/offres/emploi.rechercheoffre:afficherplusderesultats')]")))
ActionChains(driver).move_to_element(more_button).perform()
attempts = 0
while attempts < 2 :
try:
more_button.click()
break
except StaleElementReferenceException as exception:
print(exception.msg)
attempts = attempts + 1
except TimeoutException:
break
time.sleep(10)
print(driver.page_source)
print("Complete")
time.sleep(10)
输出:
stale element reference: element is not attached to the page document
(Session info: chrome=94.0.4606.81)
如果您在 logs
中看到此内容,并且不想看到此内容,则必须发表评论 print(exception.msg)
。
进口:
from selenium.webdriver.common.action_chains import ActionChains
尝试使用execute_script方法,我认为这是解决此类问题最可靠的方法。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)
time.sleep(10)
wait = WebDriverWait(wd, 10)
# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
more_button = wd.find_element_by_link_text('AFFICHER LES 20 OFFRES SUIVANTES')
wd.execute_script('arguments[0].click()', more_button)
#print('clicked')
except (TimeoutException, NoSuchElementException):
break
time.sleep(10)
print(wd.page_source)
print("Complete")
time.sleep(10)
wd.quit()
我正在尝试完整加载此页面:https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1
我设置了一行代码来处理 cookie 弹出窗口。
然后我设置了一些行以单击“加载更多结果”按钮以加载完整的 html 然后打印它。
但是点了一次就报错:
StaleElementReferenceException: stale element reference: element is not attached to the page document
我不知道这是什么意思,也不知道如何解决
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)
time.sleep(10)
wait = WebDriverWait(wd, 10)
# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()
except TimeoutException:
break
time.sleep(10)
print(wd.page_source)
print("Complete")
time.sleep(10)
wd.quit()
StaleElementReferenceException: stale element reference: element is not attached to the page document
表示对元素的引用现在是“陈旧的”--- 元素不再出现在页面的 DOM 上。这种期望的原因可能是您的 DOM
已更新或刷新。例如,执行 click()
等操作后,您的 DOM
可能会更新或刷新。此时当您尝试在 DOM
上查找元素时,您将遇到此错误。
您必须在更新或刷新中重新找到该元素 DOM
try:
more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()
except StaleElementReferenceException:
more_button = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
more_button.click()
有很多方法可以处理陈旧的元素引用。
一个是尝试在 while 循环中重新点击 web 元素。
你的 link_text 看起来也不对,请使用下面的 xpath :
# click cookies popup
driver.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
more_button = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick,'tagDeClick') and contains(@href,'/offres/emploi.rechercheoffre:afficherplusderesultats')]")))
ActionChains(driver).move_to_element(more_button).perform()
attempts = 0
while attempts < 2 :
try:
more_button.click()
break
except StaleElementReferenceException as exception:
print(exception.msg)
attempts = attempts + 1
except TimeoutException:
break
time.sleep(10)
print(driver.page_source)
print("Complete")
time.sleep(10)
输出:
stale element reference: element is not attached to the page document
(Session info: chrome=94.0.4606.81)
如果您在 logs
中看到此内容,并且不想看到此内容,则必须发表评论 print(exception.msg)
。
进口:
from selenium.webdriver.common.action_chains import ActionChains
尝试使用execute_script方法,我认为这是解决此类问题最可靠的方法。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import time
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)
time.sleep(10)
wait = WebDriverWait(wd, 10)
# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()
time.sleep(10)
# click show more button until no more results to load
while True:
try:
wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
more_button = wd.find_element_by_link_text('AFFICHER LES 20 OFFRES SUIVANTES')
wd.execute_script('arguments[0].click()', more_button)
#print('clicked')
except (TimeoutException, NoSuchElementException):
break
time.sleep(10)
print(wd.page_source)
print("Complete")
time.sleep(10)
wd.quit()