XPath find_elements_by_xpath() returns 空列表,但 XPath 帮助程序扩展显示结果

XPath find_elements_by_xpath() returns empty list, but XPath helper extension shows results

我无法从包含 XPath 的页面获取 href 属性文章链接。

因此,这是在 https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1 上使用 Chrome 扩展 XPath Helper 进行查询的结果:

//table[@class="table recordList"]//@href

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from threading import Thread

url_xpath = '//table[@class="table recordList"]//@href'
url = 'https://www.ethics.senate.gov/public/index.cfm/dearcolleagueletters?page=1'
            
options = Options()
options.headless = True
# change filepath of chromedriver
driver = webdriver.Chrome(options=options, executable_path=r'C:\Users\User\Desktop\chromedriver')
    
try:
    driver.get(url)
    print("got url") #<- reaches here
    url_elements = driver.find_elements_by_xpath(url_xpath)
    print("url_elements", url_elements) # <- doesn't reach here
    for url_elements in url_elements:
        article_url = url_elements.get_attribute('href')
        print("article url", article_url)
except:  
   pass  

我哪里错了?

谢谢

问题是使用的 xpath 表达式。 因为你在最后使用 //@href,所以你会收到一个错误,因为返回的结果实际上不是 element 类型,而是 attribute:

Message: invalid selector: The result of the xpath expression "//table[@class="table recordList"]//@href" is: [object Attr]. It should be an element.

除此之外,您还会获得附件图标 href,这可能符合您的要求,也可能不符合您的要求。

要仅获取文章链接,您可以使用此 xpath 表达式:

//table[@class="table recordList"]//a[@class=\'ContentGrid\']

如果您同时需要文章 URL 和附件 URL,您可以使用这个:

//table[@class="table recordList"]//a[@class='ContentGrid' or @title='View Files']