如何使用 selenium 从 class 到 Python 中的文本节点抓取文本

Question

我有一些 HTML 正在使用 selenium 进行抓取，我想抓取小标签内的文本。对于其他示例，我不能使用 XPath，因为 XPath 发生了变化。这是 HTML:

<h3 class="price">
    .04
<small>ex</small><br> <small>.84 <small>inc</small></small></h3>

我知道你可以使用 price = driver.find_elements_by_class_name("price") 并使用 price[1].text 来获取文本，但我最终得到了一个 selenium webdriver 元素：

<selenium.webdriver.remote.webelement.WebElement (session="a95cede569123a83f5b043cd5e138c7c", element="a3cabc71-e3cf-4faa-8281-875f9e47d6a4")>

有没有办法抓取 30.84 文本？

Answer 1

文本 30.84 位于文本节点内。因此，要打印文本，您必须引入 for the visibility_of_element_located() and you can use either of the following :

使用XPATH和子节点:

print(driver.execute_script('return arguments[0].firstChild.textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]")))).strip())

使用 XPATH 和 splitlines():

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='price']//small[.//small[text()='inc']]"))).get_attribute("innerHTML").splitlines()[1])

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

参考

您可以在以下位置找到详细的相关讨论：

How to print the partial text from an element using Selenium and Python

如何使用 selenium 从 class 到 Python 中的文本节点抓取文本

How do I use selenium to scrape text from a text node within a class through Python

python

selenium

xpath

selenium-webdriver

webdriverwait

参考