如何使用Selenium和Python提取第一个搜索结果的href属性

Question

我的 excel 上有一个书单，我想为每本书填一个摘要栏。为此，我打算 goodreads.com，搜索“哈利波特”，打开出现的第一个结果，然后复制粘贴摘要文本。但是，无法获得第一个搜索结果 link。这是我的代码。 Link 我提到了：

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver=webdriver.Chrome()
driver.get('https://goodreads.com')


loginbox=driver.find_element_by_xpath('//*[@id="userSignInFormEmail"]')
loginbox.send_keys('shivam01anand@gmail.com')
passwordbox=driver.find_element_by_xpath('//*[@id="user_password"]')
passwordbox.send_keys('shivam03')
loginButton=driver.find_element_by_xpath('//*[@id="sign_in"]/div[3]/input[1]')
loginButton.click()

searchbox=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]')
searchbox.send_keys('harry potter')

searchButton=driver.find_element_by_xpath('/html/body/div[2]/div/header/div[2]/div/div[2]/form/button')
searchButton.click()

elem=driver.find_element_by_css_selector("bookTitle").get_attribute("href")
print(elem)
#elem = driver.find_element_by_css_selector("bookTitle [href]")

Error: NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div/header/div[2]/div/div[2]/form/input[1]"}
  (Session info: chrome=83.0.4103.116)

只有在我写 elem 行时才会出现这个错误，这很奇怪，因为错误是在前一行。一头雾水。

Answer 1

要打印第一个搜索结果的 href 属性的值，您必须归纳 for the visibility_of_element_located() and you can use either of the following :

使用CSS_SELECTOR:

driver.get("https://goodreads.com")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
driver.find_element_by_xpath("//input[@value='Sign in']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
driver.find_element_by_xpath("//button[@aria-label='Search']").click()
# extracting the _href_ attribute of the first search result using CSS_SELECTOR
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.tableList > tbody > tr td a.bookTitle"))).get_attribute("href"))

使用XPATH:

driver.get("https://goodreads.com")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//input[@id='userSignInFormEmail']"))).send_keys("shivam01anand@gmail.com")
driver.find_element_by_xpath("//input[@id='user_password']").send_keys("shivam03")
driver.find_element_by_xpath("//input[@value='Sign in']").click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("harry potter")
driver.find_element_by_xpath("//button[@aria-label='Search']").click()
# extracting the _href_ attribute of the first search result using XPATH
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='tableList']/tbody/tr//td//a[@class='bookTitle']"))).get_attribute("href"))

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

控制台输出：

https://www.goodreads.com/book/show/3.Harry_Potter_and_the_Sorcerer_s_Stone?from_search=true&from_srp=true&qid=3nIjRXwsfG&rank=1

参考资料

您可以在以下位置找到关于的一些相关讨论：

如何使用Selenium和Python提取第一个搜索结果的href属性

How to extract the href attribute of the first search result using Selenium and Python

python

selenium

xpath

css-selectors

webdriverwait

参考资料