如何使用Selenium和Python提取第一个搜索结果的href属性

How to extract the href attribute of the first search result using Selenium and Python

我的 excel 上有一个书单,我想为每本书填一个摘要栏。为此,我打算 goodreads.com,搜索“哈利波特”,打开出现的第一个结果,然后复制粘贴摘要文本。但是,无法获得第一个搜索结果 link。这是我的代码。 Link 我提到了:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver=webdriver.Chrome()
driver.get('https://goodreads.com')


loginbox=driver.find_element_by_xpath('//*[@id="userSignInFormEmail"]')
loginbox.send_keys('shivam01anand@gmail.com')
passwordbox=driver.find_element_by_xpath('//*[@id="user_password"]')
passwordbox.send_keys('shivam03')
loginButton=driver.find_element_by_xpath('//*[@id="sign_in"]/div[3]/input[1]')
loginButton.click()

searchbox=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]')
searchbox.send_keys('harry potter')

searchButton=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/button')
searchButton.click()

elem=driver.find_element_by_css_selector("bookTitle").get_attribute("href")
print(elem)
#elem = driver.find_element_by_css_selector("bookTitle [href]")
Error: NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]"}
  (Session info: chrome=83.0.4103.116)

只有在我写 elem 行时才会出现这个错误,这很奇怪,因为错误是在前一行。一头雾水。

要打印第一个搜索结果的 href 属性的值,您必须归纳 for the visibility_of_element_located() and you can use either of the following :

  • 使用CSS_SELECTOR:

    driver.get("https://goodreads.com")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
    driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
    driver.find_element_by_xpath("//input[@value='Sign in']").click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
    driver.find_element_by_xpath("//button[@aria-label='Search']").click()
    # extracting the _href_ attribute of the first search result using CSS_SELECTOR
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tableList > tbody > tr td a.bookTitle"))).get_attribute("href"))
    
  • 使用XPATH:

    driver.get("https://goodreads.com")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
    driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
    driver.find_element_by_xpath("//input[@value='Sign in']").click()
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
    driver.find_element_by_xpath("//button[@aria-label='Search']").click()
    # extracting the _href_ attribute of the first search result using XPATH
    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tableList']/tbody/tr//td//a[@class='bookTitle']"))).get_attribute("href"))
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 控制台输出:

    https://www.goodreads.com/book/show/3.Harry_Potter_and_the_Sorcerer_s_Stone?from_search=true&from_srp=true&qid=3nIjRXwsfG&rank=1
    

参考资料

您可以在以下位置找到关于 的一些相关讨论: