如何使用 Selenium 和 Python 根据前面的文本或类名查找文本

Question

我是网络抓取的新手，我一直在为这个特定项目使用 Selenium。在此示例中，我正在抓取网站上的列表，它们的结构如下...

清单 1：

<html>
     <div class="div_class">
          <i class="first_i_class" style="i_style"> ::before </i>
          First Category: 
          <span class="span_class">5</span>
          <br>
          <i class="second_i_class" style="i_style"> ::before </i>
          Second Category: 
          <span class="span_class">3</span>
          <br>
     </div>
</html>

如您所见，第一类和第二类的值相似，因此查找所有元素然后使用正则表达式在这里不起作用。我需要能够根据前面的文本获取文本（在本例中为 5 和 3），在本例中为“First Category:”或“Second Category:”。但是，某些列表可能会跳过某些类别并且看起来像这样...

清单 2：

<html>
     <div class="div_class">
          <i class="third_i_class" style="i_style"> ::before </i>
          Third Category: 
          <span class="span_class">7</span>
          <br>
     </div>
</html>

因为列表之间的类别不同，我认为我不能使用类似的东西：

cat_2_value = browser.find_element_by_xpath("/html/div/span[2][@class='span_class']")

因为 xpath 也会改变。有没有一种方法可以根据

在给定范围内找到文本

前文（如“第一类：”）或
前面的 class（如“first_i_class”）？

非常感谢任何帮助或澄清问题！

Answer 1

提取文本5、3等相对于前面的class first_i_class, second_i_class 等，你需要归纳 for the visibility_of_element_located() and you can use the following xpath based :

正在打印 5:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='div_class']//i[@class='first_i_class']//following::span[1]"))).text)

正在打印 3:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='div_class']//i[@class='second_i_class']//following::span[1]"))).text)

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Answer 2

要完成@DebanjanB 的回答，其他选项。如您所愿：

The preceding text (like "First Category: ") :

//span[preceding::text()[1][normalize-space()="First Category:"]]

输出：5

The preceding class (like "first_i_class") :

//span[preceding-sibling::i[1][@class="first_i_class"]]

或

(//span[preceding-sibling::i[1][contains(@class,"i_class")]])[1]

输出：5

如果你想获得第二个 span，请将第一个表达式中的“first_i_class”替换为“second_i_class”，或者将最后一个 [1] 更改为 [2] 在第二个表达式中。

要直接获取所有 span 元素，请使用 :

//span[preceding-sibling::i[1][contains(@class,"i_class")]]

输出：5 3 7

如何使用 Selenium 和 Python 根据前面的文本或类名查找文本

How to find the text based on preceding text or classname using Selenium and Python

python

selenium

xpath

web-scraping

webdriverwait