如何从机票预订网站获取价格信息 https://reservations.airarabia.com

How to grab the price information from flight reservation site https://reservations.airarabia.com

我是 python 的新手,正在尝试学习网络抓取。按照教程,我试图从网站上提取价格,但没有打印任何内容。我的代码有什么问题?

from selenium import webdriver

chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
        print(post.text)

第一个原因是因为您尝试 抓取 的网页使用 javascript 加载 HTML 所以您需要等到该元素存在是为了使用 selenium 的 WebDriverWait

来获取它

第二个原因是 find_elements_by_class_name 方法只接受一个 class 所以你需要使用 find_elements_by_css_selectorfind_elements_by_xpath

这就是您的代码的外观

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)

driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
    lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))

for post in price:
    print(post.get_attribute("innerText"))

要打印第一个 title,您必须为所需的 visibility_of_element_located() 引入 WebDriverWait,您可以使用以下 :

  • 使用CSS_SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
    
  • 使用XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(@class, 'button')]//span[@class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 两次背靠背执行的控制台输出:

    475
    

You can find a relevant discussion in


结尾

根据文档:

  • get_attribute()方法Gets the given attribute or property of the element.
  • text属性returnsThe text of the element.
  • Difference between text and innerHTML using Selenium