将特定原始 table 的所有 <td> 与 Webdriver Selenium - Python 匹配

Match all <td> of a specific raw table with Webdriver Selenium - Python

我在网络抓取方面还是个新手,我有一个与 Webdriver 相关的问题。

代码示例:

<table>
    <tbody>
        <tr>
            <td> car </td>
            <td> bus </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & train </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & plane </td>
        </tr>
    </tbody>
</table>

<table>
    <tbody>
        <tr>
            <td> food </td>
            <td> meat</td>
        </tr>
       <tr>
            <td> drink </td>
            <td> water </td>
        </tr>
    </tbody>
</table>

所以我的想法是,在我的原始代码中,我有多个具有相同 ID 和 class 个名称的表。

问题:我怎样才能继续提取所有包含单词“bus[=27]的TR =]".

我找不到要使用的正确 xpath 语法。

使用beautifulsoup

html = "<table>
    <tbody>
        <tr>
            <td> car </td>
            <td> bus </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & train </td>
        </tr>
       <tr>
            <td> car </td>
            <td> bus & plane </td>
        </tr>
    </tbody>
</table>

<table>
    <tbody>
        <tr>
            <td> food </td>
            <td> meat</td>
        </tr>
       <tr>
            <td> drink </td>
            <td> water </td>
        </tr>
    </tbody>
</table>"
from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "lxml")

temp = soup.findAll("td") 

output = [x for x in temp if "bus" in x.text]
//td[contains(text(),'bus')]

您可以使用 contains text ,这会给出所有包含总线的 td

要创建包含文本 bus 的所有 <tr> 及其 child <td> 的列表,您可以使用以下 based :

elements = driver.find_elements_by_xpath("//tr[.//td[contains(., 'bus')]]")

理想情况下你需要诱导 for the visibility_of_all_elements_located() and you can use either of the following :

elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//tr[.//td[contains(., 'bus')]]")))

注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC