使用 lxml 获取 xml 数据
Using lxml to get xml data
对各位python高手来说应该是个容易回答的问题吧!
我有这个 XML 信息,我正在尝试解析(它来自 URL)
<calculateRouteResponse xmlns="http://api.tomtom.com/routing" formatVersion="0.0.12">
<copyright>...</copyright>
<privacy>...</privacy>
<route>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>687</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>68</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:17:14+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>68</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>68</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>958</lengthInMeters>
<travelTimeInSeconds>114</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:17:14+11:00</departureTime>
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>1798</lengthInMeters>
<travelTimeInSeconds>224</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:19:08+11:00</departureTime>
<arrivalTime>2018-01-16T11:22:53+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>181</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>224</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>224</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>1582</lengthInMeters>
<travelTimeInSeconds>280</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:22:53+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>160</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>280</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>280</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<sections>
<section>
<startPointIndex>0</startPointIndex>
<endPointIndex>139</endPointIndex>
<sectionType>TRAVEL_MODE</sectionType>
<travelMode>car</travelMode>
</section>
</sections>
</route>
</calculateRouteResponse>
我有这个脚本,我正试图用它来获取特定信息。
from lxml import etree
import urllib.request
def parseXML(xmlFile):
"""
Parse the xml
"""
with urllib.request.urlopen("https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.79990,144.99318476311566:?routeType=shortest&key=xxxx&computeTravelTimeFor=all") as fobj:
xml = fobj.read()
#Look at Parent and Child XML organisation as this is where the data is going wrong at the moment
root = etree.fromstring(xml)
for appt in root.getchildren():
for elem in appt.getchildren():
if not elem.text:
text = "None"
else:
text = elem[0][0].text
##This is doing something with the xml based on it's tag and value.
if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
print('******** Do something with ', elem.tag, ' : ', text)
print(elem.tag + " => " + text)
if __name__ == "__main__":
parseXML("example.xml")
我得到的输出只是来自摘要和边栏选项卡。
所以E.G,
期望的输出是这样的
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>687</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
如果可能的话每条腿(所以 ->
<leg>
<summary>
<lengthInMeters>958</lengthInMeters>
<travelTimeInSeconds>114</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:17:14+11:00</departureTime>
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</leg>
XML 标签之间的数据,例如长度为 1582 米
如何更改此脚本以从 lengthinmeters、traveltimeinseconds 和那些特定子项中获取信息?特别想要摘要栏里的内容,最有价值的信息,谢谢!
感谢您的宝贵时间!
这是根据我得到的答案和我自己的解释得出的解决方案。
现在开始格式化数据,然后学习如何腌制它!
from lxml import etree
import urllib.request
def handleLeg(leg):
# print this leg as text, or save it to file maybe...
text = etree.tostring(leg, pretty_print=True)
print (text)
# also process individual elements of interest here if we want
tagsOfInterest=["noTrafficTravelTimeInSeconds", "lengthInMeters", "departureTime", "trafficDelayInSeconds"] # whatever
for child in leg:
if 'summary' in child.tag:
for elem in child:
for item in tagsOfInterest:
if item in elem.tag:
print (item + " : " + elem.text)
def parseXML(xmlFile):
"""
Parse the xml
"""
with urllib.request.urlopen("https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.79990,144.99318476311566:?routeType=shortest&key=xxxxx&computeTravelTimeFor=all") as fobj:
xml = fobj.read()
#Look at Parent and Child XML organisation as this is where the data is going wrong at the moment
root = etree.fromstring(xml)
for child in root:
if 'route' in child.tag:
for elem in child:
if 'leg' in elem.tag:
handleLeg(elem)
if __name__ == "__main__":
parseXML("example.xml")
'''
import pickle
favorite_color = { "lion": "yellow", "kitty": "red" }
pickle.dump( favorite_color, open( "save.p", "wb" ) )
'''
我的理解是 parseXML 从网站上获取数据,然后变成一个 etree,然后搜索 'route',然后 'leg' 在被解析之前.感兴趣的标签用于查找要在解释器中显示的正确文本。
尝试确保我也有摘要选项卡。
下一阶段是将这些信息放入class/对象/字典中并整理以备将来使用。
我无法访问 TomTom API,所以我无法 运行 您发布的所有代码,但我确实查看了 XML 字符串你发布了。
下面是我用来提取单个 "leg" 元素并处理它们的一些代码。我刚刚将它们打印为文本(可以将它们保存到文件中),还提取了特定的子项并打印了它们。
从你的问题中不清楚你想对数据做什么,但也许这给了你一个工作的起点。
from lxml import etree
import urllib.request
xml = '<calculateRouteResponse xmlns="http://api.tomtom.com/routing" formatVersion="0.0.12">\
<copyright>...</copyright>\
<privacy>...</privacy>\
<route>\
<summary>\
<lengthInMeters>5144</lengthInMeters>\
<travelTimeInSeconds>687</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:16:06+11:00</departureTime>\
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<leg>\
<summary>\
<lengthInMeters>806</lengthInMeters>\
<travelTimeInSeconds>68</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:16:06+11:00</departureTime>\
<arrivalTime>2018-01-16T11:17:14+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>68</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>68</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>958</lengthInMeters>\
<travelTimeInSeconds>114</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:17:14+11:00</departureTime>\
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>1798</lengthInMeters>\
<travelTimeInSeconds>224</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:19:08+11:00</departureTime>\
<arrivalTime>2018-01-16T11:22:53+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>181</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>224</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>224</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>1582</lengthInMeters>\
<travelTimeInSeconds>280</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:22:53+11:00</departureTime>\
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>160</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>280</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>280</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<sections>\
<section>\
<startPointIndex>0</startPointIndex>\
<endPointIndex>139</endPointIndex>\
<sectionType>TRAVEL_MODE</sectionType>\
<travelMode>car</travelMode>\
</section>\
</sections>\
</route>\
</calculateRouteResponse>'
def handleLeg(leg):
"""
Handle a single leg element pulled from the main xml block
"""
# now that we have a leg element, we can handle it as we want.
# first, print this leg as text, so as we can see what it contains
# NB we could also just append this text block to a file of "legs"
text = etree.tostring(leg, pretty_print=True)
print (text)
# we can see that there are individual elements of interest,
# held within the "summary" child element
# for each element of interest, extract the data and print it
tagsOfInterest=["noTrafficTravelTimeInSeconds", "lengthInMeters", "departureTime"] # whatever
for child in leg:
if 'summary' in child.tag:
# we've found the "summary" child
# so inspect each of its child element tags
# to see if it is of interest
for elem in child:
for item in tagsOfInterest:
if item in elem.tag:
# its of interest...
# print it here
print (item + " : " + elem.text)
def parseXML(xml):
"""
Parse the xml
"""
root = etree.fromstring(xml)
# look for the main "route" element, there should only be one...
# do this by checking if the text "route" appears in the element tag
for child in root:
if 'route' in child.tag:
# OK we found a/the route element. Now iterate over its "leg"
# elements and handle each one
for elem in child:
if 'leg' in elem.tag:
# this is a "leg" element so handle it
handleLeg(elem)
if __name__ == "__main__":
parseXML(xml)
对各位python高手来说应该是个容易回答的问题吧!
我有这个 XML 信息,我正在尝试解析(它来自 URL)
<calculateRouteResponse xmlns="http://api.tomtom.com/routing" formatVersion="0.0.12">
<copyright>...</copyright>
<privacy>...</privacy>
<route>
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>687</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<leg>
<summary>
<lengthInMeters>806</lengthInMeters>
<travelTimeInSeconds>68</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:17:14+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>68</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>68</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>958</lengthInMeters>
<travelTimeInSeconds>114</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:17:14+11:00</departureTime>
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>1798</lengthInMeters>
<travelTimeInSeconds>224</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:19:08+11:00</departureTime>
<arrivalTime>2018-01-16T11:22:53+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>181</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>224</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>224</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<leg>
<summary>
<lengthInMeters>1582</lengthInMeters>
<travelTimeInSeconds>280</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:22:53+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>160</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>280</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>280</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
<points>...</points>
</leg>
<sections>
<section>
<startPointIndex>0</startPointIndex>
<endPointIndex>139</endPointIndex>
<sectionType>TRAVEL_MODE</sectionType>
<travelMode>car</travelMode>
</section>
</sections>
</route>
</calculateRouteResponse>
我有这个脚本,我正试图用它来获取特定信息。
from lxml import etree
import urllib.request
def parseXML(xmlFile):
"""
Parse the xml
"""
with urllib.request.urlopen("https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.79990,144.99318476311566:?routeType=shortest&key=xxxx&computeTravelTimeFor=all") as fobj:
xml = fobj.read()
#Look at Parent and Child XML organisation as this is where the data is going wrong at the moment
root = etree.fromstring(xml)
for appt in root.getchildren():
for elem in appt.getchildren():
if not elem.text:
text = "None"
else:
text = elem[0][0].text
##This is doing something with the xml based on it's tag and value.
if elem.tag == 'travelTimeInSeconds' and int(text) > 700:
print('******** Do something with ', elem.tag, ' : ', text)
print(elem.tag + " => " + text)
if __name__ == "__main__":
parseXML("example.xml")
我得到的输出只是来自摘要和边栏选项卡。
所以E.G,
期望的输出是这样的
<summary>
<lengthInMeters>5144</lengthInMeters>
<travelTimeInSeconds>687</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:16:06+11:00</departureTime>
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
如果可能的话每条腿(所以 ->
<leg>
<summary>
<lengthInMeters>958</lengthInMeters>
<travelTimeInSeconds>114</travelTimeInSeconds>
<trafficDelayInSeconds>0</trafficDelayInSeconds>
<departureTime>2018-01-16T11:17:14+11:00</departureTime>
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>
</summary>
</leg>
XML 标签之间的数据,例如长度为 1582 米
如何更改此脚本以从 lengthinmeters、traveltimeinseconds 和那些特定子项中获取信息?特别想要摘要栏里的内容,最有价值的信息,谢谢!
感谢您的宝贵时间!
这是根据我得到的答案和我自己的解释得出的解决方案。
现在开始格式化数据,然后学习如何腌制它!
from lxml import etree
import urllib.request
def handleLeg(leg):
# print this leg as text, or save it to file maybe...
text = etree.tostring(leg, pretty_print=True)
print (text)
# also process individual elements of interest here if we want
tagsOfInterest=["noTrafficTravelTimeInSeconds", "lengthInMeters", "departureTime", "trafficDelayInSeconds"] # whatever
for child in leg:
if 'summary' in child.tag:
for elem in child:
for item in tagsOfInterest:
if item in elem.tag:
print (item + " : " + elem.text)
def parseXML(xmlFile):
"""
Parse the xml
"""
with urllib.request.urlopen("https://api.tomtom.com/routing/1/calculateRoute/-37.79205923474775,145.03010268799338:-37.798883995180496,145.03040309540322:-37.807106781970354,145.02895470253526:-37.80320743019992,145.01021142594075:-37.79990,144.99318476311566:?routeType=shortest&key=xxxxx&computeTravelTimeFor=all") as fobj:
xml = fobj.read()
#Look at Parent and Child XML organisation as this is where the data is going wrong at the moment
root = etree.fromstring(xml)
for child in root:
if 'route' in child.tag:
for elem in child:
if 'leg' in elem.tag:
handleLeg(elem)
if __name__ == "__main__":
parseXML("example.xml")
'''
import pickle
favorite_color = { "lion": "yellow", "kitty": "red" }
pickle.dump( favorite_color, open( "save.p", "wb" ) )
'''
我的理解是 parseXML 从网站上获取数据,然后变成一个 etree,然后搜索 'route',然后 'leg' 在被解析之前.感兴趣的标签用于查找要在解释器中显示的正确文本。
尝试确保我也有摘要选项卡。
下一阶段是将这些信息放入class/对象/字典中并整理以备将来使用。
我无法访问 TomTom API,所以我无法 运行 您发布的所有代码,但我确实查看了 XML 字符串你发布了。
下面是我用来提取单个 "leg" 元素并处理它们的一些代码。我刚刚将它们打印为文本(可以将它们保存到文件中),还提取了特定的子项并打印了它们。
从你的问题中不清楚你想对数据做什么,但也许这给了你一个工作的起点。
from lxml import etree
import urllib.request
xml = '<calculateRouteResponse xmlns="http://api.tomtom.com/routing" formatVersion="0.0.12">\
<copyright>...</copyright>\
<privacy>...</privacy>\
<route>\
<summary>\
<lengthInMeters>5144</lengthInMeters>\
<travelTimeInSeconds>687</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:16:06+11:00</departureTime>\
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>478</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>687</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>687</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<leg>\
<summary>\
<lengthInMeters>806</lengthInMeters>\
<travelTimeInSeconds>68</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:16:06+11:00</departureTime>\
<arrivalTime>2018-01-16T11:17:14+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>59</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>68</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>68</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>958</lengthInMeters>\
<travelTimeInSeconds>114</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:17:14+11:00</departureTime>\
<arrivalTime>2018-01-16T11:19:08+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>77</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>114</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>114</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>1798</lengthInMeters>\
<travelTimeInSeconds>224</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:19:08+11:00</departureTime>\
<arrivalTime>2018-01-16T11:22:53+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>181</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>224</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>224</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<leg>\
<summary>\
<lengthInMeters>1582</lengthInMeters>\
<travelTimeInSeconds>280</travelTimeInSeconds>\
<trafficDelayInSeconds>0</trafficDelayInSeconds>\
<departureTime>2018-01-16T11:22:53+11:00</departureTime>\
<arrivalTime>2018-01-16T11:27:33+11:00</arrivalTime>\
<noTrafficTravelTimeInSeconds>160</noTrafficTravelTimeInSeconds>\
<historicTrafficTravelTimeInSeconds>280</historicTrafficTravelTimeInSeconds>\
<liveTrafficIncidentsTravelTimeInSeconds>280</liveTrafficIncidentsTravelTimeInSeconds>\
</summary>\
<points>...</points>\
</leg>\
<sections>\
<section>\
<startPointIndex>0</startPointIndex>\
<endPointIndex>139</endPointIndex>\
<sectionType>TRAVEL_MODE</sectionType>\
<travelMode>car</travelMode>\
</section>\
</sections>\
</route>\
</calculateRouteResponse>'
def handleLeg(leg):
"""
Handle a single leg element pulled from the main xml block
"""
# now that we have a leg element, we can handle it as we want.
# first, print this leg as text, so as we can see what it contains
# NB we could also just append this text block to a file of "legs"
text = etree.tostring(leg, pretty_print=True)
print (text)
# we can see that there are individual elements of interest,
# held within the "summary" child element
# for each element of interest, extract the data and print it
tagsOfInterest=["noTrafficTravelTimeInSeconds", "lengthInMeters", "departureTime"] # whatever
for child in leg:
if 'summary' in child.tag:
# we've found the "summary" child
# so inspect each of its child element tags
# to see if it is of interest
for elem in child:
for item in tagsOfInterest:
if item in elem.tag:
# its of interest...
# print it here
print (item + " : " + elem.text)
def parseXML(xml):
"""
Parse the xml
"""
root = etree.fromstring(xml)
# look for the main "route" element, there should only be one...
# do this by checking if the text "route" appears in the element tag
for child in root:
if 'route' in child.tag:
# OK we found a/the route element. Now iterate over its "leg"
# elements and handle each one
for elem in child:
if 'leg' in elem.tag:
# this is a "leg" element so handle it
handleLeg(elem)
if __name__ == "__main__":
parseXML(xml)