如何使用 Selenium webdriver 和 Python 抓取所有搜索结果

How to scrape all the search results using Selenium webdriver and Python

我正在尝试从该站点的搜索结果中抓取所有 CRD# https://brokercheck.finra.org/search/genericsearch/list

(点击 link 时需要重新搜索,只需为 Individual 搜索输入一些随机内容即可)

我正在使用 driver.find_elements_by_xpath 定位每个结果页面上的所有 CRD 编号。但是,我已经尝试了一段时间的路径,但网络驱动程序仍然无法从站点获取 CRD。

我目前有(在Python)

crds = driver.find_elements_by_xpath("//md-list-item/div/div/div/div/div/bc-bio-geo-section/div/div/div/div/div/span")

但是结果总是空的。

尝试像这样使用 .find_elements_by_css_selector

crds = driver.find_elements_by_css_selector("span[ng-bind-html='vm.item.id']")

打印网站https://brokercheck.finra.org/search/genericsearch/grid using you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use either of the following 搜索结果中的所有CRD#:

  • 使用 CSS_SELECTORget_attribute():

    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.ng-binding[ng-bind-html='vm.item.id']")))])
    
  • 使用 XPATHtext:

    print([my_elem.text for my_elem in WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[starts-with(., 'CRD')]//following-sibling::span[1]")))])
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC