如何通过 Python 使用 Selenium 从网页中提取文本 $7.56

How to extract the text $7.56 from the webpage using Selenium through Python

  1. 转到:https://www.goodrx.com/amoxicillin
  2. 右键单击 $7.56(或任何价格)-> 在 chrome 开发工具中复制 xpath

我已经尝试了所有这些变体:

find_element(By.XPATH, '// *[ @ id = "uat-price-row-coupon-1"] / div[3] / div[1] / text()')  
find_element(By.XPATH, "//*[@id='uat-price-row-coupon-0']/div[3]/div[1]/text()")  
find_element_by_xpath("//*[@id='uat-price-row-coupon-1']/div[3]/div[1]/text()")  

我还验证了它在 Firefox"Try Xpath" 中有效

但我从 selenium 中获得了 "no such element"。

我错过了什么吗?

使用WebDriverWait等待元素可见性。该网站有机器人保护,准备好验证码。

import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# ...

wait = WebDriverWait(driver, 20)
with driver:
    driver.get("https://www.goodrx.com/amoxicillin")

    rows = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, 'li[data-qa="price_row"]')))
    for row in rows:
        store_name = row.find_element_by_css_selector('[class^="goldAddUnderline"]').text.strip()
        drug_price = row.find_element_by_css_selector('[data-qa="drug_price"]').text.strip()
        drug_price = re.findall(r"\d+.\d+", drug_price)[0]
        print(store_name, drug_price)

要提取文本 $7.56,因为它是一个文本节点,您必须为 visibility_of_element_located() 引入 WebDriverWait 和您可以使用以下任一项 :

  • 使用CSS_SELECTOR:

    driver.get('https://www.goodrx.com/amoxicillin')
    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul[aria-label='List of best coupons by price and pharmacy.']>li div[data-qa='drug_price']")))
    print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip())
    
  • 使用XPATH:

    driver.get('https://www.goodrx.com/amoxicillin')
    element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@aria-label='List of best coupons by price and pharmacy.']/li//div[@data-qa='drug_price']")))
    print(driver.execute_script('return arguments[0].childNodes[1].textContent;', element).strip())
    
  • 控制台输出:

    .56
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC