Python3 - Selenium 无法找到提供的 xpath
Python3 - Selenium unable to find xpath provided
我正在使用 Python 3 和 Selenium 从以下网站抓取一些图像链接:
import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'
link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)
driver.quit()
解析此 URL 时,您可以在页面中间看到有问题的图像。当您右键单击 Google Chrome 并检查元素时,您可以在 Chrome Dev Tools 中右键单击元素本身并获取此图像的 xpath。
一切看起来都是为了我,但是当 运行 上面的代码我得到以下错误:
Traceback (most recent call last):
File "G:\folder\folder\testfilepy", line 16, in <module>
link_path = driver.find_element_by_xpath(link_xpath).text
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
(Session info: headless chrome=83.0.4103.61)
谁能告诉我为什么 Selenium 找不到提供的 xpath?
如果您在无头模式下工作,添加 window 大小通常是个好主意。将此行添加到您的选项中:
chrome_options.add_argument('window-size=1920x1080')
提取图片的src
属性需要归纳 for the visibility_of_element_located()
and you can use either of the following :
使用CSS_SELECTOR
:
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
使用XPATH
:
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--headless')
options.add_argument('--window-size=1920,1080')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
控制台输出:
https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
参考
您可以在以下位置找到关于 的一些详细讨论:
您的 xpath 似乎是正确的。您无法定位,因为您忘记处理 cookie。自己试试吧。将驱动程序搁置几秒钟,然后单击同意所有 cookie。然后你会看到你的元素。有多种方式来处理cookie。我能够通过使用我自己的更干净的 xpath 来找到 xpath。我从最近的 parent 访问该元素。
希望对您有所帮助。
您有正确的 xpath
,但不要使用绝对路径,它很容易被更改。试试这个亲戚 xpath
: //div[@class="c-bezel programme-content__image"]//img
.
要实现您的意思,请使用 .get_attribute("src")
而不是 .text
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()
或者更好的方法,使用 css 选择器。这应该更快:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))
参考:https://selenium-python.readthedocs.io/locating-elements.html
我正在使用 Python 3 和 Selenium 从以下网站抓取一些图像链接:
import sys
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
link_xpath = '/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img'
link_path = driver.find_element_by_xpath(link_xpath).text
print(link_path)
driver.quit()
解析此 URL 时,您可以在页面中间看到有问题的图像。当您右键单击 Google Chrome 并检查元素时,您可以在 Chrome Dev Tools 中右键单击元素本身并获取此图像的 xpath。
一切看起来都是为了我,但是当 运行 上面的代码我得到以下错误:
Traceback (most recent call last):
File "G:\folder\folder\testfilepy", line 16, in <module>
link_path = driver.find_element_by_xpath(link_xpath).text
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "G:\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/main/div/div[2]/div[2]/div/div/div[2]/div/div[2]/div[1]/div/div/div[2]/div/img"}
(Session info: headless chrome=83.0.4103.61)
谁能告诉我为什么 Selenium 找不到提供的 xpath?
如果您在无头模式下工作,添加 window 大小通常是个好主意。将此行添加到您的选项中:
chrome_options.add_argument('window-size=1920x1080')
提取图片的src
属性需要归纳visibility_of_element_located()
and you can use either of the following
使用
CSS_SELECTOR
:options = webdriver.ChromeOptions() options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--headless') options.add_argument('--window-size=1920,1080') driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.o-layout__item div.c-bezel.programme-content__image>img"))).get_attribute("src"))
使用
XPATH
:options = webdriver.ChromeOptions() options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) options.add_argument('--headless') options.add_argument('--window-size=1920,1080') driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='o-layout__item']//div[@class='c-bezel programme-content__image']/img"))).get_attribute("src"))
控制台输出:
https://images.metadata.sky.com/pd-image/251eeec2-acb3-4733-891b-60f10f2cc28c/16-9/640
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
参考
您可以在以下位置找到关于
您的 xpath 似乎是正确的。您无法定位,因为您忘记处理 cookie。自己试试吧。将驱动程序搁置几秒钟,然后单击同意所有 cookie。然后你会看到你的元素。有多种方式来处理cookie。我能够通过使用我自己的更干净的 xpath 来找到 xpath。我从最近的 parent 访问该元素。
希望对您有所帮助。
您有正确的 xpath
,但不要使用绝对路径,它很容易被更改。试试这个亲戚 xpath
: //div[@class="c-bezel programme-content__image"]//img
.
要实现您的意思,请使用 .get_attribute("src")
而不是 .text
driver.get('https://www.sky.com/tv-guide/20200605/4101-1/107/Efe2-364')
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//div[@class="c-bezel programme-content__image"]//img')))
print(element.get_attribute("src"))
driver.quit()
或者更好的方法,使用 css 选择器。这应该更快:
element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.c-bezel.programme-content__image > img')))
参考:https://selenium-python.readthedocs.io/locating-elements.html