python 硒网络驱动程序 "no such element"

Question

我正在尝试创建一个简单的抓取循环来从动态页面中获取标题。我制作了一个按我预期的方式工作的小脚本。这是工作脚本：

from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')

url = "https://www.youtube.com/user/LinusTechTips/videos"
driver.get(url)

videos = driver.find_elements_by_xpath('.//*[@id="dismissable"]')

for video in videos:
        title = video.find_element_by_xpath('.//*[@id="video-title"]').text
        print(title)

它正确地抓取包含标题和其他详细信息的 div 并抓取标题。但是这个脚本似乎只适用于 youtube。我已经在 craigslist、亚马逊、booktoscrape、rightmove 和 hostelworld 上试过了，但它似乎在这些页面上都不起作用。这是 hostelworld 的脚本：

from selenium import webdriver
driver = webdriver.Chrome('C:/Users/user/Downloads/chromedriver_win32/chromedriver.exe')

url = "https://www.hostelworld.com/s? 
q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08- 
14&to=2020-08-16&guests=2&page=1"

driver.get(url)

cards = driver.find_elements_by_xpath('.//*[@id="__layout"]/div/div[1]/div[4]/div/div/div[3]')

for card in cards:
    title = card.find_element_by_xpath('.//* 
    [@id="__layout"]/div/div[1]/div[4]/div/div/div[3]/div[2]/div[1]/h2/a').text
    print(title)

我很确定卡片 class 的名称是正确的，因为我在 Chrome 开发工具中进行了搜索。我认为 title xpath 是正确的，因为如果我在循环外使用它，它会正确打印。我认为循环也是正确的，因为如果我将 cards 变量更改为：

cards = driver.find_elements_by_class_name('property-card')

它为页面上的每张卡片打印一次标题。

但是，当我将 . 添加到标题 xpath 时，它 returns 出现错误 "Message: no such element: Unable to locate element: ..."。我使用 . 作为表达式的前缀，因此它只搜索迭代的 parent 元素，而不是整个页面。但由于某种原因，添加 . 会在我尝试过的所有网站上抛出错误，除了 youtube。

我尽量坚持使用 xpath，因为并非所有网站都有良好的 class 和 id 约定。

Answer 1

获取所有 properties.Induce WebDriverWait() 的标题并等待 visibility_of_all_elements_located() 和以下 css selecor.

url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card h2.title.title-6>a")))
for card in cards:
    title = card.text
    print(title)

输出:

The Local NYC
HI NYC Hostel
NY Moore Hostel
Broadway Hotel n Hostel
Q4 Hotel
American Dream Hostel
Giorgio Hotel
Freehand New York
West Side YMCA
Hotel 31
Vanderbilt YMCA
Union Hotel Brooklyn
Victorian Inn
Central Park West Hostel
Jazz on the Park Youth Hotel
The Jane
Nesva Hotel
John Hotel

请注意您需要导入以下库。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

已更新价格。

url = "https://www.hostelworld.com/s?q=New%20York,%20New%20York,%20USA&country=USA&city=New%20York&type=city&id=13&from=2020-08-14&to=2020-08-16&guests=2&page=1"
driver.get(url)
cards=WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"div.property-card")))
for card in cards:

    try:
       title = card.find_element_by_css_selector("h2.title.title-6>a").text
       print(title)
       price=card.find_element_by_css_selector("p.price.title-5").text
       print(price)
    except:
      continue

输出:

The Local NYC
€45
HI NYC Hostel
€41
NY Moore Hostel
€158
Broadway Hotel n Hostel
€73
Freehand New York
€95
Q4 Hotel
€37
Giorgio Hotel
€158
American Dream Hostel
€128
West Side YMCA
€87
Vanderbilt YMCA
€89
Hotel 31
€74
Union Hotel Brooklyn
€128
Victorian Inn
€88
Central Park West Hostel
€42
The Jane
€115
Jazz on the Park Youth Hotel
€78
Nesva Hotel
€136
John Hotel
€165

python 硒网络驱动程序 "no such element"

python selenium webdriver "no such element"

python

selenium

webdriver