如何获取 xml 中标签之间的文本,最好使用 lxml

How to get the text between tags in a xml, using preferably lxml

这是标签的示例,但我无法获取标签之间的文本,无法遍历标签,无法在节点 <seg> 中使用 node.text。这就是为什么我要问,欢迎所有帮助(对不起我的英语)。

    <tuv>
         <seg>If you want to save items in a 
            <bpt i="1">&lt;Message id=&quot;Message:1T0000772343:f000012900ce8eb3:MPhS&quot;&gt;</bpt>
            <ept i="1">&lt;/Message&gt;</ept> 
            for which no connection has been established or in a 
            <bpt i="2">&lt;Message id=&quot;Message:1T0000772343:f000012900ceac3d:pvy4&quot;&gt;</bpt>
            <ept i="2">&lt;/Message&gt;</ept> 
            that requires authentication, you need to connect to the library.
         </seg>
   </tuv>

想要的输出:

如果要将项目保存在尚未建立连接或需要身份验证的目录中,则需要连接到图书馆。

<seg> 元素上使用 .xpath("text()") 获取所有文本节点。

此代码打印所需的输出:

from lxml import etree

root = etree.parse("tuv.xml")  
seg = root.find("seg")

# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))

# Print result with unwanted whitespace removed
print " ".join(text.split())