使用 Selenium 为 Python 迭代读取 <table> 中的特定元素

Question

我正在尝试从这个 table 中读取定期更改的信息。 HTML 看起来像这样：

<table class="the_table_im_reading">
  <thead>...</thead>
  <tbody>
    <tr id="uc_6042339">
      <td class="expansion">...</td>
      <td>
        <div id="card_6042339_68587" class="cb">
          <a href="/uniquelink" class="cl" onmouseover="cardHover('somecard');" onmouseout="cardOut()">TEXT I NEED TO READ</a>
      </td>
      <td>...</td>
      more td's
    </tr>
    <tr id="uc_6194934">...</tr>
      <td class="expansion">...</td>
      similar as the first <tr id="uc...">

我能够通过以下方式到达 table：

table_xpath = "//*[@id="content-wrapper"]/div[5]/table"
table_element = driver.find_element_by_xpath(table_xpath)

我正在尝试阅读每个唯一 <tr id="uc_unique number">. 我需要阅读的文本部分 id=uc_unique 数字会定期更改，因此我无法使用按 id 查找元素。

有没有办法到达该元素并阅读该特定文本？

Answer 1

看来您可以通过锚元素 link（href 属性）进行搜索，因为我想这不会改变。

通过 xpath:

yourText = table_element.find_element_by_xpath(.//a[@href='/blahsomelink']).text

更新

OP 提到他的 link 也在改变（每次调用？），这意味着第一种方法不适合他。

如果你想要第一个行元素的文本，你可以试试这个：

yourText = table_element.find_element_by_xpath(.//tr[1]//a[@class='cl']).text

例如，如果您知道 link 元素始终位于第一行的第二个数据元素中，并且只有一个 link 元素，那么您可以这样做：

yourText = table_element.find_element_by_xpath(.//tr[1]/td[2]//a).text

除非您对真正搜索的内容提供更详细的要求，否则到目前为止这就足够了...

另一个更新

OP 提供了有关他的要求的更多信息：

I am trying to get the text in each row.

鉴于每个 tr 元素中只有一个带有 class cl 的锚元素，您可以执行以下操作：

elements = table_element.find_elements_by_xpath(.//tr//a[@class='cl'])
for element in elements:
    row_text = element.text

现在您可以对所有这些文本做任何您想做的事...

Answer 2

看起来你有几个选择。

如果你只想要第一个A，它可能就像

一样简单

table_element.find_element_by_css_selector("a.cl")).text

或者更具体一点

table_element.find_element_by_css_selector("div.cb > a.cl")).text

如果您想要所有 A，请尝试上面的 find_elements_* 版本。

Answer 3

我设法使用 .get_attribute("textContent") 而不是 .text 找到了我需要的元素，来自

的提示

使用 Selenium 为 Python 迭代读取 <table> 中的特定元素

Iteratively reading a specific element from a <table> with Selenium for Python

python

selenium

webdriver

web-scraping

selenium-webdriver

更新

另一个更新