如何从使用 react.js 和 Python 中的 Selenium 的网页中抓取数据？

Question

我在抓取使用 react.js 的网站时遇到了一些困难，我不确定为什么会这样。

这是网站的html：

我想做的是点击带 class: play-pause-button btn btn -naked 的按钮。但是，当我使用 Mozilla gecko webdriver 加载页面时，会抛出一个异常

Message: Unable to locate element: .play-pause-button btn btn-naked

这让我觉得也许我应该做些别的事情来获得这个元素？到目前为止，这是我的代码：

driver.get("https://drawittoknowit.com/course/neurological-system/anatomy/peripheral-nervous-system/1332/brachial-plexus---essentials")
    # execute script to scroll down the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    time.sleep(10)        
    soup = BeautifulSoup(driver.page_source, 'lxml')
    print(driver.page_source)
    play_button = driver.find_element_by_class_name("play-pause-button btn btn-naked").click()
    print(play_button)

有人知道我该如何解决这个问题吗？非常感谢任何帮助

Answer 1

看来你很接近。使用 find_element_by_class_name() 时，您不能传递多个类并且您只能传递一个 classname，即只有以下其中之一：

play-pause-button
btn
btn-naked

通过类通过 find_element_by_class_name() 你将面临

解决方案

作为替代方案，因为元素是 Angular element, to click() on the element you have to induce WebDriverWait for the element_to_be_clickable() and you you can use either of the following :

使用CSS_SELECTOR:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button.play-pause-button.btn.btn-naked")))click()

使用XPATH:

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//button[@class='play-pause-button btn btn-naked']")))click()

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

如何从使用 react.js 和 Python 中的 Selenium 的网页中抓取数据？

How to scrape data from webpage which uses react.js with Selenium in Python?

python

selenium

web-scraping

reactjs

webdriverwait

解决方案