获取特定 XML 个元素的值并将它们添加到 Python 中的数组
Get the value of specific XML emements and add them to an array in Python
以下面的 XML 为例,我试图获取所有 <d:LayerXml>
标签的内容并将它们添加到数组中。要解析 XML 我正在使用 ElementTree。
我首先尝试使用那里的名称访问 XML 元素,但这失败了,因为显然没有名为 'entry' -
的元素
root = ET.fromstring(r.text)
for child in root:
if child.tag == entry':
print child.attirb
在打印出所有子标签 (print child.tag
) 后,我注意到每个子标签都带有 roor 元素中提供的 xmlns 后缀。例如,'entry' 实际上是 '{http://www.w3.org/2005/Atom}'。
所以接下来我尝试使用该后缀访问元素,但由于语法错误而失败。
root = ET.fromstring(r.text)
for child in root:
if child.tag == '{http://www.w3.org/2005/Atom}entry':
layerXML = child.{http://www.w3.org/2005/Atom}content
# Also tried - layerXML = child.'{http://www.w3.org/2005/Atom}content'
print layerXML
所以给定以下 XML 示例,我如何将所有 <d:LayerXml>
元素的内容添加到数组中。澄清一下,在这种情况下,数组将包含 I want this
和 I want this, too
.
<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://tablestore.somewhere.com/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml">
<id>https://tablestore.somewhere.com/TableName</id>
<title type="text">TableName</title>
<updated>2017-03-02T12:01:04Z</updated>
<link rel="self" title="TableName" href="TableName" />
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer1</d:Layer>
<d:LayerXml>I want this</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer2</d:Layer>
<d:LayerXml>I want this, too</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
</feed>
你还没说是哪个语法错误,下面给出了我想要的结果:
from xml.etree import ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://tablestore.somewhere.com/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml">
<id>https://tablestore.somewhere.com/TableName</id>
<title type="text">TableName</title>
<updated>2017-03-02T12:01:04Z</updated>
<link rel="self" title="TableName" href="TableName" />
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer1</d:Layer>
<d:LayerXml>I want this</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer2</d:Layer>
<d:LayerXml>I want this, too</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
</feed>
'''
feed = ET.fromstring(xml)
values = [value.text for value in feed.findall('{http://www.w3.org/2005/Atom}entry/{http://www.w3.org/2005/Atom}content/{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}properties/{http://schemas.microsoft.com/ado/2007/08/dataservices}LayerXml')]
print(values)
其实好像也可以用
values = [value.text for value in feed.findall('.//{http://schemas.microsoft.com/ado/2007/08/dataservices}LayerXml')]
或
values = [value.text for value in feed.findall('.//d:LayerXml', { 'd' : 'http://schemas.microsoft.com/ado/2007/08/dataservices' })]
如果您不想列出完整路径。
以下面的 XML 为例,我试图获取所有 <d:LayerXml>
标签的内容并将它们添加到数组中。要解析 XML 我正在使用 ElementTree。
我首先尝试使用那里的名称访问 XML 元素,但这失败了,因为显然没有名为 'entry' -
的元素root = ET.fromstring(r.text)
for child in root:
if child.tag == entry':
print child.attirb
在打印出所有子标签 (print child.tag
) 后,我注意到每个子标签都带有 roor 元素中提供的 xmlns 后缀。例如,'entry' 实际上是 '{http://www.w3.org/2005/Atom}'。
所以接下来我尝试使用该后缀访问元素,但由于语法错误而失败。
root = ET.fromstring(r.text)
for child in root:
if child.tag == '{http://www.w3.org/2005/Atom}entry':
layerXML = child.{http://www.w3.org/2005/Atom}content
# Also tried - layerXML = child.'{http://www.w3.org/2005/Atom}content'
print layerXML
所以给定以下 XML 示例,我如何将所有 <d:LayerXml>
元素的内容添加到数组中。澄清一下,在这种情况下,数组将包含 I want this
和 I want this, too
.
<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://tablestore.somewhere.com/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml">
<id>https://tablestore.somewhere.com/TableName</id>
<title type="text">TableName</title>
<updated>2017-03-02T12:01:04Z</updated>
<link rel="self" title="TableName" href="TableName" />
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer1</d:Layer>
<d:LayerXml>I want this</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer2</d:Layer>
<d:LayerXml>I want this, too</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
</feed>
你还没说是哪个语法错误,下面给出了我想要的结果:
from xml.etree import ElementTree as ET
xml = '''<?xml version="1.0" encoding="utf-8"?>
<feed xml:base="https://tablestore.somewhere.com/" xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml">
<id>https://tablestore.somewhere.com/TableName</id>
<title type="text">TableName</title>
<updated>2017-03-02T12:01:04Z</updated>
<link rel="self" title="TableName" href="TableName" />
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer1-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer1</d:Layer>
<d:LayerXml>I want this</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
<entry m:etag="W/"datetime'2017-03-02T11%3A46%3A37.1271167Z'"">
<id>https://tablestore.somewhere.com/TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')</id>
<category term="tablestore.TableName" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<link rel="edit" title="TableName" href="TableName(PartitionKey='PartitonKey',RowKey='layer2-tileMatrixSet')" />
<title />
<updated>2017-03-02T12:01:04Z</updated>
<author>
<name />
</author>
<content type="application/xml">
<m:properties>
<d:PartitionKey>PartitonKey</d:PartitionKey>
<d:RowKey>RowKey</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">2017-03-02T11:46:37.1271167Z</d:Timestamp>
<d:AuthType>basic</d:AuthType>
<d:Credentials>CREDENTIALS1</d:Credentials>
<d:Layer>layer2</d:Layer>
<d:LayerXml>I want this, too</d:LayerXml>
<d:Service>https://www.google.co.uk</d:Service>
<d:TileMatrixSet>tileMatrixSet</d:TileMatrixSet>
</m:properties>
</content>
</entry>
</feed>
'''
feed = ET.fromstring(xml)
values = [value.text for value in feed.findall('{http://www.w3.org/2005/Atom}entry/{http://www.w3.org/2005/Atom}content/{http://schemas.microsoft.com/ado/2007/08/dataservices/metadata}properties/{http://schemas.microsoft.com/ado/2007/08/dataservices}LayerXml')]
print(values)
其实好像也可以用
values = [value.text for value in feed.findall('.//{http://schemas.microsoft.com/ado/2007/08/dataservices}LayerXml')]
或
values = [value.text for value in feed.findall('.//d:LayerXml', { 'd' : 'http://schemas.microsoft.com/ado/2007/08/dataservices' })]
如果您不想列出完整路径。