访问使用 ElementTree 解析的 xml 文件中的嵌套子项

Question

我是 xml 解析的新手。 This xml file 有以下树：

FHRSEstablishment
 |--> Header
 |    |--> ...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...
 |--> EstablishmentCollection
 |    |--> EstablishmentDetail
 |    |    |-->...
 |    |--> Scores
 |    |    |-->...

但是当我使用 ElementTree 访问它并查找 child 标签和属性时，

import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
   file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
   print child.tag, child.attrib

我只得到：

Header {}
EstablishmentCollection {}

我假设这意味着它们的属性是空的。为什么会这样，我如何访问嵌套在 EstablishmentDetail 和 Scores 中的子项？

编辑

感谢下面的答案，我可以进入树中，但是如果我想检索 Scores 中的值，这会失败：

for node in root.find('.//EstablishmentDetail/Scores'):
    rating = node.attrib.get('Hygiene')
    print rating

并生产

None
None
None

这是为什么？

Answer 1

你必须对你的根进行 iter()。

那就是 root.iter() 就可以了！

import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
   print child.tag, child.attrib

输出：

FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...

要获取 EstablishmentDetail 内的所有标签，您需要找到该标签，然后循环遍历其 children!

也就是说，例如。

for child in root.find('.//EstablishmentDetail'):
    print child.tag, child.attrib

输出：

FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}

要获得您在评论中提到的 Hygiene 的分数，

您所做的是，它将获得第一个 Scores 标签，并且当您调用 for each in root.find('.//Scores'):rating=child.get('Hygiene') 时，它将具有 Hygiene、ConfidenceInManagement、Structural 标签作为 child。也就是说，显然三个child都不会有元素！

你需要先 - 找到所有 Scores 标签。 - 在找到的每个标签中找到 Hygiene！

for each in root.findall('.//Scores'):
    rating = each.find('.//Hygiene')
    print '' if rating is None else rating.text

输出：

Answer 2

希望有用：

import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
    doc = etree.iterparse(tmpfile, events=("start", "end"))
    doc = iter(doc)
    event, root = doc.next()
    num = 0
    for event, elem in doc:
        print event, elem

访问使用 ElementTree 解析的 xml 文件中的嵌套子项

Access nested children in xml file parsed with ElementTree

python

xml

tree

elementtree

xml-parsing