访问使用 ElementTree 解析的 xml 文件中的嵌套子项
Access nested children in xml file parsed with ElementTree
我是 xml 解析的新手。 This xml file 有以下树:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
但是当我使用 ElementTree 访问它并查找 child
标签和属性时,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
我只得到:
Header {}
EstablishmentCollection {}
我假设这意味着它们的属性是空的。为什么会这样,我如何访问嵌套在 EstablishmentDetail
和 Scores
中的子项?
编辑
感谢下面的答案,我可以进入树中,但是如果我想检索 Scores
中的值,这会失败:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
并生产
None
None
None
这是为什么?
你必须对你的根进行 iter()。
那就是 root.iter()
就可以了!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
输出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
- 要获取
EstablishmentDetail
内的所有标签,您需要找到该标签,然后循环遍历其 children!
也就是说,例如。
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
输出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
- 要获得您在评论中提到的
Hygiene
的分数,
您所做的是,它将获得第一个 Scores
标签,并且当您调用 for each in root.find('.//Scores'):rating=child.get('Hygiene')
时,它将具有 Hygiene、ConfidenceInManagement、Structural 标签作为 child。也就是说,显然三个child都不会有元素!
你需要先
- 找到所有 Scores
标签。
- 在找到的每个标签中找到 Hygiene
!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
输出:
5
5
5
0
5
希望有用:
import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
doc = etree.iterparse(tmpfile, events=("start", "end"))
doc = iter(doc)
event, root = doc.next()
num = 0
for event, elem in doc:
print event, elem
我是 xml 解析的新手。 This xml file 有以下树:
FHRSEstablishment
|--> Header
| |--> ...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
|--> EstablishmentCollection
| |--> EstablishmentDetail
| | |-->...
| |--> Scores
| | |-->...
但是当我使用 ElementTree 访问它并查找 child
标签和属性时,
import xml.etree.ElementTree as ET
import urllib2
tree = ET.parse(
file=urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml' % i))
root = tree.getroot()
for child in root:
print child.tag, child.attrib
我只得到:
Header {}
EstablishmentCollection {}
我假设这意味着它们的属性是空的。为什么会这样,我如何访问嵌套在 EstablishmentDetail
和 Scores
中的子项?
编辑
感谢下面的答案,我可以进入树中,但是如果我想检索 Scores
中的值,这会失败:
for node in root.find('.//EstablishmentDetail/Scores'):
rating = node.attrib.get('Hygiene')
print rating
并生产
None
None
None
这是为什么?
你必须对你的根进行 iter()。
那就是 root.iter()
就可以了!
import xml.etree.ElementTree as ET
import urllib2
tree =ET.parse(urllib2.urlopen('http://ratings.food.gov.uk/OpenDataFiles/FHRS408en-GB.xml'))
root = tree.getroot()
for child in root.iter():
print child.tag, child.attrib
输出:
FHRSEstablishment {}
Header {}
ExtractDate {}
ItemCount {}
ReturnCode {}
EstablishmentCollection {}
EstablishmentDetail {}
FHRSID {}
LocalAuthorityBusinessID {}
...
- 要获取
EstablishmentDetail
内的所有标签,您需要找到该标签,然后循环遍历其 children!
也就是说,例如。
for child in root.find('.//EstablishmentDetail'):
print child.tag, child.attrib
输出:
FHRSID {}
LocalAuthorityBusinessID {}
BusinessName {}
BusinessType {}
BusinessTypeID {}
RatingValue {}
RatingKey {}
RatingDate {}
LocalAuthorityCode {}
LocalAuthorityName {}
LocalAuthorityWebSite {}
LocalAuthorityEmailAddress {}
Scores {}
SchemeType {}
NewRatingPending {}
Geocode {}
- 要获得您在评论中提到的
Hygiene
的分数,
您所做的是,它将获得第一个 Scores
标签,并且当您调用 for each in root.find('.//Scores'):rating=child.get('Hygiene')
时,它将具有 Hygiene、ConfidenceInManagement、Structural 标签作为 child。也就是说,显然三个child都不会有元素!
你需要先
- 找到所有 Scores
标签。
- 在找到的每个标签中找到 Hygiene
!
for each in root.findall('.//Scores'):
rating = each.find('.//Hygiene')
print '' if rating is None else rating.text
输出:
5
5
5
0
5
希望有用:
import xml.etree.ElementTree as etree
with open('filename.xml') as tmpfile:
doc = etree.iterparse(tmpfile, events=("start", "end"))
doc = iter(doc)
event, root = doc.next()
num = 0
for event, elem in doc:
print event, elem