XML 解析帮助 Python lxml、etree 或 dom
XML Parsing help Python lxml, etree, or dom
我一直在努力尝试解析来自库文档的 XML 响应,但无法确定找到我想要的值的简单方法。我将使用任何公共库。
示例 XML 响应,采用字符串格式:
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:s="http://dev.splunk.com/ns/rest"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>search index</title>
<id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
<updated>2011-07-07T20:49:58.000-07:00</updated>
<link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
<published>2011-07-07T20:49:57.000-07:00</published>
<link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
<link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
<link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
<link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
<link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
<link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
<link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
<author>
<name>admin</name>
</author>
<content type="text/xml">
<s:dict>
<s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
<s:key name="delegate"></s:key>
<s:key name="diskUsage">2174976</s:key>
<s:key name="dispatchState">DONE</s:key>
<s:key name="doneProgress">1.00000</s:key>
<s:key name="dropCount">0</s:key>
<s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
<s:key name="eventAvailableCount">287</s:key>
<s:key name="eventCount">287</s:key>
<s:key name="eventFieldCount">6</s:key>
<s:key name="eventIsStreaming">1</s:key>
<s:key name="eventIsTruncated">0</s:key>
<s:key name="eventSearch">search index</s:key>
<s:key name="eventSorting">desc</s:key>
<s:key name="isDone">1</s:key>
我截断了输出,我想要的两个值是文本值:
- 姓名="isDone" (1)
- 名称="doneProgress" (1.00000)
- 姓名="eventCount" (287)
如何找到这些数值?
您可以使用 lxml
和 xpath
:
ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)
将打印 [1]
。完整示例:
xml = '''
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:s="http://dev.splunk.com/ns/rest"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>search index</title>
<id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
<updated>2011-07-07T20:49:58.000-07:00</updated>
<link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
<published>2011-07-07T20:49:57.000-07:00</published>
<link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
<link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
<link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
<link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
<link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
<link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
<link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
<author>
<name>admin</name>
</author>
<content type="text/xml">
<s:dict>
<s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
<s:key name="delegate"></s:key>
<s:key name="diskUsage">2174976</s:key>
<s:key name="dispatchState">DONE</s:key>
<s:key name="doneProgress">1.00000</s:key>
<s:key name="dropCount">0</s:key>
<s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
<s:key name="eventAvailableCount">287</s:key>
<s:key name="eventCount">287</s:key>
<s:key name="eventFieldCount">6</s:key>
<s:key name="eventIsStreaming">1</s:key>
<s:key name="eventIsTruncated">0</s:key>
<s:key name="eventSearch">search index</s:key>
<s:key name="eventSorting">desc</s:key>
<s:key name="isDone">1</s:key>
</s:dict>
</content>
</entry>
'''
from lxml import etree
from cStringIO import StringIO
xml = StringIO(xml)
xml = etree.parse(xml)
ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)
我一直在努力尝试解析来自库文档的 XML 响应,但无法确定找到我想要的值的简单方法。我将使用任何公共库。
示例 XML 响应,采用字符串格式:
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:s="http://dev.splunk.com/ns/rest"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>search index</title>
<id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
<updated>2011-07-07T20:49:58.000-07:00</updated>
<link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
<published>2011-07-07T20:49:57.000-07:00</published>
<link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
<link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
<link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
<link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
<link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
<link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
<link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
<author>
<name>admin</name>
</author>
<content type="text/xml">
<s:dict>
<s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
<s:key name="delegate"></s:key>
<s:key name="diskUsage">2174976</s:key>
<s:key name="dispatchState">DONE</s:key>
<s:key name="doneProgress">1.00000</s:key>
<s:key name="dropCount">0</s:key>
<s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
<s:key name="eventAvailableCount">287</s:key>
<s:key name="eventCount">287</s:key>
<s:key name="eventFieldCount">6</s:key>
<s:key name="eventIsStreaming">1</s:key>
<s:key name="eventIsTruncated">0</s:key>
<s:key name="eventSearch">search index</s:key>
<s:key name="eventSorting">desc</s:key>
<s:key name="isDone">1</s:key>
我截断了输出,我想要的两个值是文本值:
- 姓名="isDone" (1)
- 名称="doneProgress" (1.00000)
- 姓名="eventCount" (287)
如何找到这些数值?
您可以使用 lxml
和 xpath
:
ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)
将打印 [1]
。完整示例:
xml = '''
<entry
xmlns="http://www.w3.org/2005/Atom"
xmlns:s="http://dev.splunk.com/ns/rest"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>search index</title>
<id>https://localhost:8089/services/search/jobs/mysearch_02151949</id>
<updated>2011-07-07T20:49:58.000-07:00</updated>
<link href="/services/search/jobs/mysearch_02151949" rel="alternate"/>
<published>2011-07-07T20:49:57.000-07:00</published>
<link href="/services/search/jobs/mysearch_02151949/search.log" rel="search.log"/>
<link href="/services/search/jobs/mysearch_02151949/events" rel="events"/>
<link href="/services/search/jobs/mysearch_02151949/results" rel="results"/>
<link href="/services/search/jobs/mysearch_02151949/results_preview" rel="results_preview"/>
<link href="/services/search/jobs/mysearch_02151949/timeline" rel="timeline"/>
<link href="/services/search/jobs/mysearch_02151949/summary" rel="summary"/>
<link href="/services/search/jobs/mysearch_02151949/control" rel="control"/>
<author>
<name>admin</name>
</author>
<content type="text/xml">
<s:dict>
<s:key name="cursorTime">1969-12-31T16:00:00.000-08:00</s:key>
<s:key name="delegate"></s:key>
<s:key name="diskUsage">2174976</s:key>
<s:key name="dispatchState">DONE</s:key>
<s:key name="doneProgress">1.00000</s:key>
<s:key name="dropCount">0</s:key>
<s:key name="earliestTime">2011-07-07T11:18:08.000-07:00</s:key>
<s:key name="eventAvailableCount">287</s:key>
<s:key name="eventCount">287</s:key>
<s:key name="eventFieldCount">6</s:key>
<s:key name="eventIsStreaming">1</s:key>
<s:key name="eventIsTruncated">0</s:key>
<s:key name="eventSearch">search index</s:key>
<s:key name="eventSorting">desc</s:key>
<s:key name="isDone">1</s:key>
</s:dict>
</content>
</entry>
'''
from lxml import etree
from cStringIO import StringIO
xml = StringIO(xml)
xml = etree.parse(xml)
ns = {'s':"http://dev.splunk.com/ns/rest"}
print xml.xpath("//s:key[@name='isDone']/text()", namespaces=ns)