Xpath

Question

我有一个 table 的部分，我正在尝试获取值 "Distributor 10"

<table class="d">
    <tr>
        <td class="ah">supplier<td>
        <td class="ad">
            <a href="/S/3/143.html">Supplier 10</a>
        </td>
    </tr>
    <tr>
        <td class="ah">distributor<pre><td>
        <td class="ad">
            <a href="/D/3/143.html">Distributor 10</a>
        </td>
    </tr>
</table>

如果我在 Chrome 开发人员中，我通过使用以下 xpath 字符串

获取此值

//tr/td[text()="distributor]/following-sibling::td[@class="ad"]/a/text()

但是当我在 python 中编写代码时 - 它 returns 是一个空列表...据我所知，这是因为 [=30] 旁边的 <pre> 标记=] 当我修改上面提到的 xpath 以寻找 "supplier" 而不是 distributor 效果很好

欢迎提出任何建议

Answer 1

假设您使用的是 lxml，您可以使用以下 XPath 之一来实现它：

//tr[contains(.,"distributor")]//a/text()

//a[parent::td[@class="ad"] and starts-with(@href,"/D")]/text()

一段代码：

from lxml import etree
from io import StringIO
html = '''<table class="d">
    <tr>
        <td class="ah">supplier<td>
        <td class="ad">
            <a href="/S/3/143.html">Supplier 10</a>
        </td>
    </tr>
    <tr>
        <td class="ah">distributor<pre><td>
        <td class="ad">
            <a href="/D/3/143.html">Distributor 10</a>
        </td>
    </tr>
</table>'''

parser = etree.HTMLParser()
tree   = etree.parse(StringIO(html), parser)

data = tree.xpath('//tr[contains(.,"distributor")]//a/text()')
print (data)

输出：['Distributor 10']

备选方案：使用 lxml html cleaner class ("remove_tags") 从页面中删除 pre 元素。

参考文献：

Xpath - 当条件包含标签时检索文本值

Xpath - Retrieveing Text value when condition contains a tag

python-2.7

google-chrome-devtools