使用 Selenium 抓取时出现 StaleElementReferenceException 问题

StaleElementReferenceException issue while scraping with Selenium

我正在尝试完整加载此页面:https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1

我设置了一行代码来处理 cookie 弹出窗口。

然后我设置了一些行以单击“加载更多结果”按钮以加载完整的 html 然后打印它。

但是点了一次就报错:

StaleElementReferenceException: stale element reference: element is not attached to the page document

我不知道这是什么意思,也不知道如何解决

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
import time

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)

time.sleep(10)

wait = WebDriverWait(wd, 10)

# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()

time.sleep(10)

# click show more button until no more results to load
while True:
    try:
        more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()
    except TimeoutException:
        break

time.sleep(10)

print(wd.page_source)
print("Complete")

time.sleep(10)
wd.quit()

StaleElementReferenceException: stale element reference: element is not attached to the page document

表示对元素的引用现在是“陈旧的”--- 元素不再出现在页面的 DOM 上。这种期望的原因可能是您的 DOM 已更新或刷新。例如,执行 click() 等操作后,您的 DOM 可能会更新或刷新。此时当您尝试在 DOM 上查找元素时,您将遇到此错误。

您必须在更新或刷新中重新找到该元素 DOM

       try:  
            more_button = wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES'))).click()  
     except StaleElementReferenceException:
            more_button = WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
            more_button.click()

有很多方法可以处理陈旧的元素引用。

一个是尝试在 while 循环中重新点击 web 元素。

你的 link_text 看起来也不对,请使用下面的 xpath :

# click cookies popup
driver.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()

time.sleep(10)

# click show more button until no more results to load
while True:
    try:
        more_button = wait.until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick,'tagDeClick') and contains(@href,'/offres/emploi.rechercheoffre:afficherplusderesultats')]")))
        ActionChains(driver).move_to_element(more_button).perform()
        attempts = 0
        while attempts < 2 :
            try:
                more_button.click()
                break
            except StaleElementReferenceException as exception:
                print(exception.msg)
            attempts = attempts  + 1

    except TimeoutException:
        break

time.sleep(10)

print(driver.page_source)
print("Complete")

time.sleep(10)

输出:

stale element reference: element is not attached to the page document
  (Session info: chrome=94.0.4606.81)

如果您在 logs 中看到此内容,并且不想看到此内容,则必须发表评论 print(exception.msg)

进口:

from selenium.webdriver.common.action_chains import ActionChains

尝试使用execute_script方法,我认为这是解决此类问题最可靠的方法。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException, NoSuchElementException
import time

options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

site = 'https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1'
wd = webdriver.Chrome("C:\Program Files (x86)\chromedriver.exe", options=options)
wd.get(site)

time.sleep(10)

wait = WebDriverWait(wd, 10)

# click cookies popup
wd.find_element_by_xpath('//*[(@id = "description")]//*[contains(concat( " ", @class, " " ), concat( " ", "tc-open-privacy-center", " " ))]').click()

time.sleep(10)

# click show more button until no more results to load
while True:
    try:
        wait.until(EC.visibility_of_element_located((By.LINK_TEXT, 'AFFICHER LES 20 OFFRES SUIVANTES')))
        more_button = wd.find_element_by_link_text('AFFICHER LES 20 OFFRES SUIVANTES')
        wd.execute_script('arguments[0].click()', more_button)
        #print('clicked')
    except (TimeoutException, NoSuchElementException):
        break

time.sleep(10)

print(wd.page_source)
print("Complete")

time.sleep(10)
wd.quit()