Selenium:从 HTML Table 中提取属性值

Selenium: Extract attribute values from HTML Table

link that contain the HTML table

这是XML文本

<!DOCTYPE html>
    <html>
    <head>
    <body onkeyup="return key_up(event,'dwhswdf_org')" onload="onLoad()" style="padding: 0px;">
    <script type="text/javascript">
    <head>
    <body>
    <div id="dark"></div>
    <div id="light"></div>
    <div id="wrapper">
    <div id="cattext"></div>
    <div id="titletext"></div>
    <div id="tabstext"></div>
    <br>
    <table width="1%" cellspacing="0" cellpadding="0" border="0">
    <tbody>
    <tr>
    <td width="1%" valign="top">
    <b>Details</b>
    <table width="1%" cellspacing="0" cellpadding="0" border="0">
    <tbody>
    <tr>
    <td>Site no.</td>
    <td>G0010005</td>
    </tr>
    <tr>
    <td>Site commence</td>
    <td>09/08/1965</td>
    </tr>
    <tr>
    <td>Zero gauge</td>
    <td>0</td>
    </tr>
    <tr>
    <td>Datum</td>
    <td>GD</td>
    </tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    <tr>
    </tbody>
    </table>
    </td>
    <td valign="top" align="left">
    </tr>
    <tr>
    </tbody>
    </table>
    </div>
    <style type="text/css">
    </body>
    <script>
    </body>
    </html>

我的问题是如何提取 HTML table 元素 G0010005, 09/08/1965, 0 具有属性名称 'Site no.', 'Site commence', 'Zero gauge' 分别使用 python 中的 selenium 包。我尝试使用很少的参数进行提取,但其中 none 对我有用。以下是我到目前为止编写的代码...

>>> from selenium import webdriver
>>> driver  = webdriver.Firefox()
>>> driver.get("https://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&")
>>> tbl = driver.find_element_by_xpath("//html/body/body/div[3]/table/tbody/tr[1]/td[1]/table/tbody/tr[1]/td[2]")
>>> tb1.get_attribute()

Traceback (most recent call last):
  File "<pyshell#7>", line 1, in <module>
    get_attribute(tb1)
NameError: name 'get_attribute' is not defined

>>> tbl = driver.find_element_by_name("Site no.")

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    tbl = driver.find_element_by_name("Site no.")
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 365, in find_element_by_name
    return self.find_element(by=By.NAME, value=name)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 752, in find_element
    'value': value})['value']
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 236, in execute
    self.error_handler.check_response(response)
  File "C:\Python27\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 192, in check_response
    raise exception_class(message, screen, stacktrace)
NoSuchElementException: Message: Unable to locate element: [name="Site no."]

>>> tbl = driver.find_element_by_text('Site no.')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    tbl = driver.find_element_by_text('Site no.')
AttributeError: 'WebDriver' object has no attribute 'find_element_by_text'

感谢任何帮助。

尝试以下代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


browser = webdriver.Firefox()
browser.get('http://water.nt.gov.au/cgi/webhyd.pl?dwhswdf_org=G0010005&cat=dwhsw&lvl=1&')
siteNoEle = browser.find_element_by_xpath("//td[text()='Site no.']/following-sibling::td[1]")
siteNo = siteNoEle.text
print siteNo

siteCommenceEle = browser.find_element_by_xpath("//td[text()='Site commence']/following-sibling::td[1]")
siteCommence = siteCommenceEle.text
print siteCommence


zeroEle = browser.find_element_by_xpath("//td[text()='Zero gauge']/following-sibling::td[1]")
zero = zeroEle.text
print zero

browser.quit()

建议:

  1. tbl = driver.find_element_by_name("Site no."):使用HTML中指定的name="Site no."时的方法。在给定的 HTML 中,Site no. 不是 ``name` 属性的值。所以,你不能使用它。
  2. tbl = driver.find_element_by_text('Site no.'): WebDriver中没有定义这个方法。原来的方法是find_element_by_link_text。此方法用于查找带有 a 标签链接的元素,但不适用于 HTML 中的所有元素。您无法使用该元素的文本获取该元素(链接除外,即 a 标签文本)。