如何在 python 中使用 xpath 查询带有命名空间的 xml 数据
how to query xml data with namespaces using xpath in python
我正在尝试使用以下代码将 XPath 查询应用于 XML 具有命名空间的数据:
from lxml import etree
from io import StringIO
xml = '''
<gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
<metadata>
<time>2015-07-07T15:16:40Z</time>
</metadata>
<trk>
<name>some name</name>
<trkseg>
<trkpt lat="46.3884140" lon="10.0286290">
<ele>2261.8</ele>
<time>2015-07-07T15:30:42Z</time>
</trkpt>
<trkpt lat="46.3884050" lon="10.0286240">
<ele>2261.6</ele>
<time>2015-07-07T15:30:43Z</time>
</trkpt>
<trkpt lat="46.3884000" lon="10.0286210">
<ele>2262.0</ele>
<time>2015-07-07T15:30:46Z</time>
</trkpt>
<trkpt lat="46.3884000" lon="10.0286210">
<ele>2261.8</ele>
<time>2015-07-07T15:30:47Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
'''
# this is to simulate that above xml was read from a file
file = StringIO(unicode(xml)) # with python 3 use "file = StringIO(xml)"
# reading the xml from a file
tree = etree.parse(file)
ns = {'xmlns': 'http://www.topografix.com/GPX/1/1',
'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xmlns:gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'xmlns:gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
expr = 'trk/trkseg/trkpt/ele'
for element in tree.xpath(expr, namespaces=ns):
print(element.text)
我期望代码的输出如下:
2261.8
2261.6
2262.0
2261.8
当您替换 XML 根元素时
<gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
和
<gpx>
代码正在运行...
有什么建议可以让它也与名称空间一起工作吗?
您可以将命名空间定义为 -
ns = {'n': 'http://www.topografix.com/GPX/1/1',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
这会将 'http://www.topografix.com/GPX/1/1'
的前缀定义为 n
,然后在您的 XPath 查询中,您可以使用该前缀。例子-
expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
for element in tree.xpath(expr, namespaces=ns):
print(element.text)
这是因为根节点的 xmlns 是 - 'http://www.topografix.com/GPX/1/1'
- 因此所有子节点自动继承它作为 xmlns(名称空间),除非子节点使用不同的前缀或指定名称空间它自己的。
Example/Demo -
In [44]: ns = {'n': 'http://www.topografix.com/GPX/1/1',
....: 'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
....: 'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
....: 'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
In [45]:
In [45]: expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
In [46]: for element in tree.xpath(expr, namespaces=ns):
....: print(element.text)
....:
2261.8
2261.6
2262.0
2261.8
我正在尝试使用以下代码将 XPath 查询应用于 XML 具有命名空间的数据:
from lxml import etree
from io import StringIO
xml = '''
<gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
<metadata>
<time>2015-07-07T15:16:40Z</time>
</metadata>
<trk>
<name>some name</name>
<trkseg>
<trkpt lat="46.3884140" lon="10.0286290">
<ele>2261.8</ele>
<time>2015-07-07T15:30:42Z</time>
</trkpt>
<trkpt lat="46.3884050" lon="10.0286240">
<ele>2261.6</ele>
<time>2015-07-07T15:30:43Z</time>
</trkpt>
<trkpt lat="46.3884000" lon="10.0286210">
<ele>2262.0</ele>
<time>2015-07-07T15:30:46Z</time>
</trkpt>
<trkpt lat="46.3884000" lon="10.0286210">
<ele>2261.8</ele>
<time>2015-07-07T15:30:47Z</time>
</trkpt>
</trkseg>
</trk>
</gpx>
'''
# this is to simulate that above xml was read from a file
file = StringIO(unicode(xml)) # with python 3 use "file = StringIO(xml)"
# reading the xml from a file
tree = etree.parse(file)
ns = {'xmlns': 'http://www.topografix.com/GPX/1/1',
'xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'xmlns:gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'xmlns:gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
expr = 'trk/trkseg/trkpt/ele'
for element in tree.xpath(expr, namespaces=ns):
print(element.text)
我期望代码的输出如下:
2261.8
2261.6
2262.0
2261.8
当您替换 XML 根元素时
<gpx creator="udos" version="1.1" xmlns="http://www.topografix.com/GPX/1/1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3">
和
<gpx>
代码正在运行...
有什么建议可以让它也与名称空间一起工作吗?
您可以将命名空间定义为 -
ns = {'n': 'http://www.topografix.com/GPX/1/1',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
这会将 'http://www.topografix.com/GPX/1/1'
的前缀定义为 n
,然后在您的 XPath 查询中,您可以使用该前缀。例子-
expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
for element in tree.xpath(expr, namespaces=ns):
print(element.text)
这是因为根节点的 xmlns 是 - 'http://www.topografix.com/GPX/1/1'
- 因此所有子节点自动继承它作为 xmlns(名称空间),除非子节点使用不同的前缀或指定名称空间它自己的。
Example/Demo -
In [44]: ns = {'n': 'http://www.topografix.com/GPX/1/1',
....: 'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
....: 'gpxtpx': 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
....: 'gpxx': 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'}
In [45]:
In [45]: expr = 'n:trk/n:trkseg/n:trkpt/n:ele'
In [46]: for element in tree.xpath(expr, namespaces=ns):
....: print(element.text)
....:
2261.8
2261.6
2262.0
2261.8