如何获取 xml 中标签之间的文本,最好使用 lxml
How to get the text between tags in a xml, using preferably lxml
这是标签的示例,但我无法获取标签之间的文本,无法遍历标签,无法在节点 <seg>
中使用 node.text。这就是为什么我要问,欢迎所有帮助(对不起我的英语)。
<tuv>
<seg>If you want to save items in a
<bpt i="1"><Message id="Message:1T0000772343:f000012900ce8eb3:MPhS"></bpt>
<ept i="1"></Message></ept>
for which no connection has been established or in a
<bpt i="2"><Message id="Message:1T0000772343:f000012900ceac3d:pvy4"></bpt>
<ept i="2"></Message></ept>
that requires authentication, you need to connect to the library.
</seg>
</tuv>
想要的输出:
如果要将项目保存在尚未建立连接或需要身份验证的目录中,则需要连接到图书馆。
在 <seg>
元素上使用 .xpath("text()")
获取所有文本节点。
此代码打印所需的输出:
from lxml import etree
root = etree.parse("tuv.xml")
seg = root.find("seg")
# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))
# Print result with unwanted whitespace removed
print " ".join(text.split())
这是标签的示例,但我无法获取标签之间的文本,无法遍历标签,无法在节点 <seg>
中使用 node.text。这就是为什么我要问,欢迎所有帮助(对不起我的英语)。
<tuv>
<seg>If you want to save items in a
<bpt i="1"><Message id="Message:1T0000772343:f000012900ce8eb3:MPhS"></bpt>
<ept i="1"></Message></ept>
for which no connection has been established or in a
<bpt i="2"><Message id="Message:1T0000772343:f000012900ceac3d:pvy4"></bpt>
<ept i="2"></Message></ept>
that requires authentication, you need to connect to the library.
</seg>
</tuv>
想要的输出:
如果要将项目保存在尚未建立连接或需要身份验证的目录中,则需要连接到图书馆。
在 <seg>
元素上使用 .xpath("text()")
获取所有文本节点。
此代码打印所需的输出:
from lxml import etree
root = etree.parse("tuv.xml")
seg = root.find("seg")
# Get the text nodes of 'seg' as one string
text = " ".join(t for t in seg.xpath("text()"))
# Print result with unwanted whitespace removed
print " ".join(text.split())