使用 lxml 和 xpath 读取 GPX
Read GPX using lxml and xpath
由此 post,我知道我可以 .find()
、.findall()
和 .text()
获取嵌套在标签中的值。
以下面的.gpx文件为例,
<?xml version="1.0"?>
<gpx version="1.1" creator="Trails 1.28 - https://www.trails.io" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www8.garmin.com/xmlschemas/TrackPointExtensionv2.xsd" xmlns:trailsio="http://trails.io/GPX/1/0" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://trails.io/GPX/1/0 https://trails.io/GPX/1/0/trails_1.0.xsd">
<metadata>
<time>2016-03-27T06:30:06Z</time>
</metadata>
<trk>
<name><![CDATA[xyz]]></name>
<extensions><trailsio:TrackExtension><trailsio:activity>trekking</trailsio:activity></trailsio:TrackExtension></extensions>
<trkseg>
<trkpt lat="22.491121" lon="114.137634">
<ele>41.270</ele>
<time>2016-03-27T01:21:21Z</time>
</trkpt>
<trkpt lat="22.491104" lon="114.137612">
<ele>42.777</ele>
<time>2016-03-27T01:21:38Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
如果我想获取海拔,我可以尝试:
gpx = etree.parse("D:/Users/perry/Downloads/abc.gpx")
ele = gpx.findall("{http://www.topografix.com/GPX/1/1}trk")
ele = [x.findall("{http://www.topografix.com/GPX/1/1}trkseg") for x in ele][0]
ele = [x.findall("{http://www.topografix.com/GPX/1/1}trkpt") for x in ele][0]
ele = [x.findall("{http://www.topografix.com/GPX/1/1}ele") for x in ele]
[x[0].text for x in ele]
而输出结果是['41.270', '42.777']
,正是我想要的!太棒了!
不过,我想用.xpath()
,但是
gpx.xpath("//ele")
,
gpx.xpath("//{http://www.topografix.com/GPX/1/1}ele")
和
gpx.xpath("//ele", namespaces = {'ele': "http://www.topografix.com/GPX/1/1"})
要么 return []
要么错误 "lxml.etree.XPathEvalError: Invalid expression".
如何使用 .xpath()
获取海拔高度?
谢谢!
你在正确的轨道上:
gpx.xpath("//ele", namespaces = {'ele': "http://www.topografix.com/GPX/1/1"})
因为 XML 中有一个默认命名空间,XPath //ele
本身不会在 http://www.topografix.com/GPX/1/1
命名空间中找到 ele
元素。
因此有必要向 XPath 提供程序注册一个前缀,您已经完成了。但是,您随后需要使用它的注册前缀来引用该元素。所以以下将起作用:
gpx.xpath("//gpx:ele", namespaces = {'gpx': "http://www.topografix.com/GPX/1/1"})
由此 post,我知道我可以 .find()
、.findall()
和 .text()
获取嵌套在标签中的值。
以下面的.gpx文件为例,
<?xml version="1.0"?>
<gpx version="1.1" creator="Trails 1.28 - https://www.trails.io" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxtpx="http://www8.garmin.com/xmlschemas/TrackPointExtensionv2.xsd" xmlns:trailsio="http://trails.io/GPX/1/0" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://trails.io/GPX/1/0 https://trails.io/GPX/1/0/trails_1.0.xsd">
<metadata>
<time>2016-03-27T06:30:06Z</time>
</metadata>
<trk>
<name><![CDATA[xyz]]></name>
<extensions><trailsio:TrackExtension><trailsio:activity>trekking</trailsio:activity></trailsio:TrackExtension></extensions>
<trkseg>
<trkpt lat="22.491121" lon="114.137634">
<ele>41.270</ele>
<time>2016-03-27T01:21:21Z</time>
</trkpt>
<trkpt lat="22.491104" lon="114.137612">
<ele>42.777</ele>
<time>2016-03-27T01:21:38Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
如果我想获取海拔,我可以尝试:
gpx = etree.parse("D:/Users/perry/Downloads/abc.gpx")
ele = gpx.findall("{http://www.topografix.com/GPX/1/1}trk")
ele = [x.findall("{http://www.topografix.com/GPX/1/1}trkseg") for x in ele][0]
ele = [x.findall("{http://www.topografix.com/GPX/1/1}trkpt") for x in ele][0]
ele = [x.findall("{http://www.topografix.com/GPX/1/1}ele") for x in ele]
[x[0].text for x in ele]
而输出结果是['41.270', '42.777']
,正是我想要的!太棒了!
不过,我想用.xpath()
,但是
gpx.xpath("//ele")
,
gpx.xpath("//{http://www.topografix.com/GPX/1/1}ele")
和
gpx.xpath("//ele", namespaces = {'ele': "http://www.topografix.com/GPX/1/1"})
要么 return []
要么错误 "lxml.etree.XPathEvalError: Invalid expression".
如何使用 .xpath()
获取海拔高度?
谢谢!
你在正确的轨道上:
gpx.xpath("//ele", namespaces = {'ele': "http://www.topografix.com/GPX/1/1"})
因为 XML 中有一个默认命名空间,XPath //ele
本身不会在 http://www.topografix.com/GPX/1/1
命名空间中找到 ele
元素。
因此有必要向 XPath 提供程序注册一个前缀,您已经完成了。但是,您随后需要使用它的注册前缀来引用该元素。所以以下将起作用:
gpx.xpath("//gpx:ele", namespaces = {'gpx': "http://www.topografix.com/GPX/1/1"})