如何从机票预订网站获取价格信息 https://reservations.airarabia.com
How to grab the price information from flight reservation site https://reservations.airarabia.com
我是 python 的新手,正在尝试学习网络抓取。按照教程,我试图从网站上提取价格,但没有打印任何内容。我的代码有什么问题?
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
print(post.text)
第一个原因是因为您尝试 抓取 的网页使用 javascript 加载 HTML 所以您需要等到该元素存在是为了使用 selenium 的 WebDriverWait
来获取它
第二个原因是 find_elements_by_class_name
方法只接受一个 class 所以你需要使用 find_elements_by_css_selector
或 find_elements_by_xpath
这就是您的代码的外观
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))
for post in price:
print(post.get_attribute("innerText"))
要打印第一个 title,您必须为所需的 visibility_of_element_located()
引入 WebDriverWait,您可以使用以下 :
使用CSS_SELECTOR
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
使用XPATH
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(@class, 'button')]//span[@class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
两次背靠背执行的控制台输出:
475
You can find a relevant discussion in
结尾
根据文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
我是 python 的新手,正在尝试学习网络抓取。按照教程,我试图从网站上提取价格,但没有打印任何内容。我的代码有什么问题?
from selenium import webdriver
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = driver.find_elements_by_class_name("fare-and-services-flight-select-fare-value ng-isolate-scope")
for post in price:
print(post.text)
第一个原因是因为您尝试 抓取 的网页使用 javascript 加载 HTML 所以您需要等到该元素存在是为了使用 selenium 的 WebDriverWait
第二个原因是 find_elements_by_class_name
方法只接受一个 class 所以你需要使用 find_elements_by_css_selector
或 find_elements_by_xpath
这就是您的代码的外观
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
chrome_path = r"C:\webdrivers\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://reservations.airarabia.com/service-app/ibe/reservation.html#/fare/en/AED/AE/SHJ/KHI/07-09-2019/N/1/0/0/Y//N/N")
price = WebDriverWait(driver, 10).until(
lambda x: x.find_elements_by_css_selector(".currency-value.fare-value.ng-scope.ng-isolate-scope"))
for post in price:
print(post.get_attribute("innerText"))
要打印第一个 title,您必须为所需的 visibility_of_element_located()
引入 WebDriverWait,您可以使用以下
使用
CSS_SELECTOR
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "isa-flight-select button:first-child span.fare-and-services-flight-select-fare-value.ng-isolate-scope"))).get_attribute("innerHTML"))
使用
XPATH
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//isa-flight-select//following::button[contains(@class, 'button')]//span[@class='fare-and-services-flight-select-fare-value ng-isolate-scope']"))).text)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
两次背靠背执行的控制台输出:
475
You can find a relevant discussion in
结尾
根据文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium