XPath find_elements_by_xpath() returns 空列表，但 XPath 帮助程序扩展显示结果

Question

我无法从包含 XPath 的页面获取 href 属性文章链接。

因此，这是在 https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1 上使用 Chrome 扩展 XPath Helper 进行查询的结果：

//table[@class="table recordList"]//@href

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from threading import Thread

url_xpath = '//table[@class="table recordList"]//@href'
url = 'https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1'
            
options = Options()
options.headless = True
# change filepath of chromedriver
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\User\Desktop\chromedriver')
    
try:
    driver.get(url)
    print("got url") #<- reaches here
    url_elements = driver.find_elements_by_xpath(url_xpath)
    print("url_elements", url_elements) # <- doesn't reach here
    for url_elements in url_elements:
        article_url = url_elements.get_attribute('href')
        print("article url", article_url)
except:  
   pass

我哪里错了？

谢谢

Answer 1

问题是使用的 xpath 表达式。因为你在最后使用 //@href，所以你会收到一个错误，因为返回的结果实际上不是 element 类型，而是 attribute:

Message: invalid selector: The result of the xpath expression "//table[@class="table recordList"]//@href" is: [object Attr]. It should be an element.

除此之外，您还会获得附件图标 href，这可能符合您的要求，也可能不符合您的要求。

要仅获取文章链接，您可以使用此 xpath 表达式：

//table[@class="table recordList"]//a[@class=\'ContentGrid\']

如果您同时需要文章 URL 和附件 URL，您可以使用这个：

//table[@class="table recordList"]//a[@class='ContentGrid' or @title='View Files']

XPath find_elements_by_xpath() returns 空列表，但 XPath 帮助程序扩展显示结果

XPath find_elements_by_xpath() returns empty list, but XPath helper extension shows results

xpath

selenium-chromedriver

python-3.7