XPath find_elements_by_xpath() returns 空列表,但 XPath 帮助程序扩展显示结果
XPath find_elements_by_xpath() returns empty list, but XPath helper extension shows results
我无法从包含 XPath
的页面获取 href
属性文章链接。
因此,这是在 https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1 上使用 Chrome 扩展 XPath Helper 进行查询的结果:
//table[@class="table recordList"]//@href
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from threading import Thread
url_xpath = '//table[@class="table recordList"]//@href'
url = 'https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1'
options = Options()
options.headless = True
# change filepath of chromedriver
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\User\Desktop\chromedriver')
try:
driver.get(url)
print("got url") #<- reaches here
url_elements = driver.find_elements_by_xpath(url_xpath)
print("url_elements", url_elements) # <- doesn't reach here
for url_elements in url_elements:
article_url = url_elements.get_attribute('href')
print("article url", article_url)
except:
pass
我哪里错了?
谢谢
问题是使用的 xpath
表达式。
因为你在最后使用 //@href
,所以你会收到一个错误,因为返回的结果实际上不是 element
类型,而是 attribute
:
Message: invalid selector: The result of the xpath expression "//table[@class="table recordList"]//@href" is: [object Attr]. It should be an element.
除此之外,您还会获得附件图标 href
,这可能符合您的要求,也可能不符合您的要求。
要仅获取文章链接,您可以使用此 xpath
表达式:
//table[@class="table recordList"]//a[@class=\'ContentGrid\']
如果您同时需要文章 URL 和附件 URL,您可以使用这个:
//table[@class="table recordList"]//a[@class='ContentGrid' or @title='View Files']
我无法从包含 XPath
的页面获取 href
属性文章链接。
因此,这是在 https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1 上使用 Chrome 扩展 XPath Helper 进行查询的结果:
//table[@class="table recordList"]//@href
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from threading import Thread
url_xpath = '//table[@class="table recordList"]//@href'
url = 'https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1'
options = Options()
options.headless = True
# change filepath of chromedriver
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\User\Desktop\chromedriver')
try:
driver.get(url)
print("got url") #<- reaches here
url_elements = driver.find_elements_by_xpath(url_xpath)
print("url_elements", url_elements) # <- doesn't reach here
for url_elements in url_elements:
article_url = url_elements.get_attribute('href')
print("article url", article_url)
except:
pass
我哪里错了?
谢谢
问题是使用的 xpath
表达式。
因为你在最后使用 //@href
,所以你会收到一个错误,因为返回的结果实际上不是 element
类型,而是 attribute
:
Message: invalid selector: The result of the xpath expression "//table[@class="table recordList"]//@href" is: [object Attr]. It should be an element.
除此之外,您还会获得附件图标 href
,这可能符合您的要求,也可能不符合您的要求。
要仅获取文章链接,您可以使用此 xpath
表达式:
//table[@class="table recordList"]//a[@class=\'ContentGrid\']
如果您同时需要文章 URL 和附件 URL,您可以使用这个:
//table[@class="table recordList"]//a[@class='ContentGrid' or @title='View Files']