lxml：获取属性值后的字段

Question

我正在解析 XML 个文件，我有一个来自的后续问题。来自以下 XML 字段：

<enrollment type="Anticipated">30</enrollment>

我想抽出预期这个词和数字。在我拥有的文件中，'enrollment type'/'enrollment' 将在文件之间保持稳定，但 'anticipated' 不会（例如，有时它会说 'actual' 或其他内容）并且数字会不稳定。

我试过的代码：

from lxml import etree
import sys
import glob
list_to_get = ['enrollment']
list_of_files = glob.glob('*xml')
for each_file in list_of_files:
#    try:
        tree = etree.parse(each_file)
        root = tree.getroot()
        for node in root.xpath("//" + 'enrollment'):
            for e in node.xpath('descendant-or-self::*[not(*)]'):
                if e.attrib:
                        print e.attrib
                        print e.find('type')
                        print e.find('.//type')
                        print e.attrib['type']
                        print e.find(e.attrib['type']).text

使用这种方法，我可以提取类型（例如anticipated/actual），但我找不到任何提取数字的方法。如果有人知道我应该使用的打印行，我将不胜感激。

我确实看过一些类似的问题（例如 here），但他们的建议似乎对我不起作用。

Answer 1

你做的都是对的。只是不要复杂化。简单来说，使用 xpath 获取根节点并使用 getiterator 迭代每个子节点，每个子节点的值可以使用 tag.text

获取

例子

parent
    child
    child

for i in parent.getiterator():
    print(i.tag)#will give the first child tag
    print(i.text)#Will give the value

lxml：获取属性值后的字段

lxml: Get field after attribute value

python

xml

parsing

lxml