获取 lxml 节点中的所有文本

Question

我正在使用以下方法打印元素节点内的所有文本（不是 html，而是包含的实际文本）：

''.join(node.xpath('//div[@class="title_wrapper"]')[0].itertext())

是否有更简洁的方法来执行此操作？

Answer 1

您可以使用 XPath 的 string() 函数。

如果混合内容中有大块空白，可以使用 XPath 的 normalize-space() 函数。

所有三个示例（你的和我的两个）...

Python

from lxml import etree

xml = """<doc>
    <div class="title_wrapper">Some text. Some <span>more</span> text. 
    <span>Even <span>m<span>o</span>re</span> text!</span>
    </div>
</doc>"""

tree = etree.fromstring(xml)

print(''.join(tree.xpath('//div[@class="title_wrapper"]')[0].itertext()))

print(tree.xpath('string(//div[@class="title_wrapper"])'))

print(tree.xpath('normalize-space(//div[@class="title_wrapper"])'))

输出

Some text. Some more text. 
    Even more text!

Some text. Some more text. 
    Even more text!

Some text. Some more text. Even more text!

获取 lxml 节点中的所有文本

Get all text in an lxml node

python

lxml