使用 Selenium 从 h3 class name 获取标题列表

Question

我试图在激活搜索功能后获取一个标题列表，但我一直得到一个空列表，即使在查找 h3 class 标题的各种迭代时路径是正确的。请参阅下面的示例，其中我试图复制的一个标题位于 HTML。 class 类型每次都在变化，但位置总是在 h3 之内。

所以我尝试使用下面的代码来提取标题列表：

import pandas as pd
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.options import Options

options = Options()
options.set_preference("dom.push.enabled", False)
browser = webdriver.Firefox(options=options)

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles = browser.find_elements_by_xpath("//h3[contains(@class,'graf')]")

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text)     

browser.quit()

代码运行，但我返回的列表没有任何元素。感谢您提供任何帮助我解决此问题的提示

Answer 1

您需要等待元素在搜索后可见。使用 WebDriverWait() 并等待 visibility_of_all_elements_located()

browser.get("https://medium.com/search")
browser.find_element_by_xpath("//input[@type='search']").send_keys("Flying elephant",Keys.ENTER)
titles =WebDriverWait(browser,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//h3[contains(@class,'graf')]")))

lista = []
for names in titles:
    print(names.text)
    lista.append(names.text) 
print(lista)

您需要导入以下库。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

输出:

['A Flying Elephant, a Teacher’s Hugs: 12 Tales of Pandemic Resilience', 'Who Knew Disney Could Do Trippy Even Better Than Pink Floyd?', '#FlightFree2020: Travel Blogging And The Multiplier Effect', 'Bluesky and Dumbo ‘The Flying Elephant’', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.', 'The Flying Elephant. A Tank So Heavy The British Decided Not To Build It.']

使用 Selenium 从 h3 class name 获取标题列表

Get list of titles from h3 class name with Selenium

html

python

selenium

web-scraping

webdriverwait