几个类似的 tripadvisor 页面中只有一个异常(找不到元素)
Anomaly (cannot find element) in only one out of several similar tripadvisor pages
注意:此代码不完整,因此需要人工干预,因此只能 运行 使用 Jupyter。我正在尝试获取 tripadvisor 网页的最后页码。
“马来西亚”和“瑞士”网页运行良好(网址在下面注释掉)但“香港”网页运行不正常。
from selenium import webdriver #for navigating through the pages
driver = webdriver.Chrome(executable_path=r'C:\Users\user\Downloads\chromedriver.exe')
url = "https://www.tripadvisor.com.sg/Hotels-g294217-Hong_Kong-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g293951-Malaysia-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g188045-Switzerland-Hotels.html"
driver.get(url)
driver.implicitly_wait(5)
此处人为干预:现在点击任意“入住日期”、“退房日期”,然后点击“更新”
last_page_s = driver.find_element_by_css_selector("span.pageNum.last").get_attribute('data-page-number')
last_page = int(last_page_s)
print(last_page)
我仍然是网络抓取的新手,所以非常感谢任何帮助!!
要打印 last_page 数字,您必须得出 for the visibility_of_element_located()
and you can use either of the following :
使用CSS_SELECTOR
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.separator.cx_brand_refresh_phase2 +a"))).get_attribute('data-page-number'))
使用 XPATH`:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'separator')]//following::a"))).get_attribute('data-page-number'))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
注意:此代码不完整,因此需要人工干预,因此只能 运行 使用 Jupyter。我正在尝试获取 tripadvisor 网页的最后页码。
“马来西亚”和“瑞士”网页运行良好(网址在下面注释掉)但“香港”网页运行不正常。
from selenium import webdriver #for navigating through the pages
driver = webdriver.Chrome(executable_path=r'C:\Users\user\Downloads\chromedriver.exe')
url = "https://www.tripadvisor.com.sg/Hotels-g294217-Hong_Kong-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g293951-Malaysia-Hotels.html"
#url = "https://www.tripadvisor.com.sg/Hotels-g188045-Switzerland-Hotels.html"
driver.get(url)
driver.implicitly_wait(5)
此处人为干预:现在点击任意“入住日期”、“退房日期”,然后点击“更新”
last_page_s = driver.find_element_by_css_selector("span.pageNum.last").get_attribute('data-page-number')
last_page = int(last_page_s)
print(last_page)
我仍然是网络抓取的新手,所以非常感谢任何帮助!!
要打印 last_page 数字,您必须得出 visibility_of_element_located()
and you can use either of the following
使用
CSS_SELECTOR
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.separator.cx_brand_refresh_phase2 +a"))).get_attribute('data-page-number'))
使用 XPATH`:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[contains(@class, 'separator')]//following::a"))).get_attribute('data-page-number'))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC