Python lxml:如何遍历树
Python lxml: How to traverse back up a tree
我有以下 python 代码
import lxml.etree
root = lxml.etree.parse("../../xml/test.xml")
path="./pages/page/paragraph[contains(text(),'ash')]"
para = root.xpath(path)
一旦到达para节点,我就不想再往前走了。现在我想回到根并查看所有 <paragraph>
节点。有没有办法回到树上。
或者这样看。我想要 root
和 para
之间的子树。我该怎么做?
供参考,这里是 xml
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
现在在这种情况下,我想要节点 XBV 和 GFH。这怎么可能?
..
会让你更上一层楼。
但是,我认为 preceding
是您正在寻找的东西:
The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.
./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph
示例代码:
import lxml.etree
data = """
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
"""
tree = lxml.etree.fromstring(data)
print [item.text for item in tree.xpath("./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph")]
打印:
['XBV', 'GFH']
往上去获取所有之前的page
(only page)节点和里面的paragraph
节点,并从中提取文本-
>>>expresson = "./pages/page/paragraph[contains(text(),'ash')]//preceding::page//paragraph"
>>>x= [i.text for i in expresson]
>>>['XBV', 'GFH']
我有以下 python 代码
import lxml.etree
root = lxml.etree.parse("../../xml/test.xml")
path="./pages/page/paragraph[contains(text(),'ash')]"
para = root.xpath(path)
一旦到达para节点,我就不想再往前走了。现在我想回到根并查看所有 <paragraph>
节点。有没有办法回到树上。
或者这样看。我想要 root
和 para
之间的子树。我该怎么做?
供参考,这里是 xml
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
现在在这种情况下,我想要节点 XBV 和 GFH。这怎么可能?
..
会让你更上一层楼。
但是,我认为 preceding
是您正在寻找的东西:
The preceding axis indicates all the nodes that precede the context node in the document except any ancestor, attribute and namespace nodes.
./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph
示例代码:
import lxml.etree
data = """
<document>
<pages>
<page>
<paragraph>XBV</paragraph>
<paragraph>GFH</paragraph>
</page>
<page>
<paragraph>ash</paragraph>
<paragraph>lplp</paragraph>
</page>
</pages>
</document>
"""
tree = lxml.etree.fromstring(data)
print [item.text for item in tree.xpath("./pages/page/paragraph[contains(text(),'ash')]//preceding::paragraph")]
打印:
['XBV', 'GFH']
往上去获取所有之前的page
(only page)节点和里面的paragraph
节点,并从中提取文本-
>>>expresson = "./pages/page/paragraph[contains(text(),'ash')]//preceding::page//paragraph"
>>>x= [i.text for i in expresson]
>>>['XBV', 'GFH']