Python3 - Selenium 无法找到提供的 xpath

Python3 - Selenium unable to find xpath provided

我正在使用 Python 3 和 Selenium 从以下网站抓取一些图像链接:

import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

chrome_options = Options()  
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')

link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'

link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)

driver.quit()

解析此 URL 时,您可以在页面中间看到有问题的图像。当您右键单击 Google Chrome 并检查元素时,您可以在 Chrome Dev Tools 中右键单击元素本身并获取此图像的 xpath。

一切看起来都是为了我,但是当 运行 上面的代码我得到以下错误:

Traceback (most recent call last):
  File "G:\folder\folder\testfilepy", line 16, in <module>
    link_path = driver.find_element_by_xpath(link_xpath).text
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
  (Session info: headless chrome=83.0.4103.61)

谁能告诉我为什么 Selenium 找不到提供的 xpath?

如果您在无头模式下工作,添加 window 大小通常是个好主意。将此行添加到您的选项中:

chrome_options.add_argument('window-size=1920x1080')

提取图片的src属性需要归纳 for the visibility_of_element_located() and you can use either of the following :

  • 使用CSS_SELECTOR:

    options = webdriver.ChromeOptions() 
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--headless')
    options.add_argument('--window-size=1920,1080')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
    
  • 使用XPATH:

    options = webdriver.ChromeOptions() 
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--headless')
    options.add_argument('--window-size=1920,1080')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')     
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
    
  • 控制台输出:

    https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

参考

您可以在以下位置找到关于 的一些详细讨论:

您的 xpath 似乎是正确的。您无法定位,因为您忘记处理 cookie。自己试试吧。将驱动程序搁置几秒钟,然后单击同意所有 cookie。然后你会看到你的元素。有多种方式来处理cookie。我能够通过使用我自己的更干净的 xpath 来找到 xpath。我从最近的 parent 访问该元素。

希望对您有所帮助。

您有正确的 xpath,但不要使用绝对路径,它很容易被更改。试试这个亲戚 xpath : //div[@class="c-bezel programme-content__image"]//img.

要实现您的意思,请使用 .get_attribute("src") 而不是 .text

driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()

或者更好的方法,使用 css 选择器。这应该更快:

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))

参考:https://selenium-python.readthedocs.io/locating-elements.html