响应headerContent-Type:application/xop+xml和lxml.etree.fromstring解析
Response header Content-Type: application/xop+xml and lxml.etree.fromstring parsing
我收到来自 SOAP API 的响应,其中包含 Content-Type:application/xop+xml。我不确定 Response.text 让 lxml.etree.fromstring
让 xml 使用的效率如何。
这里是Response.text
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><listResponse xmlns="http://www.strongmail.com/services/v2/schema"><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>101</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>102</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>103</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>107</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>108</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>109</id></objectId></listResponse></soap:Body></soap:Envelope>
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94--
获取 .text 并etree.fromstring解析它
from lxml import etree
resXML = etree.fromstring(theResponse.text)
给出以下内容:
resXML = etree.fromstring(theResponse.text)
File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
我相信这是因为它期望“<”作为所有 xml 开头的第一件事。
我查看了 lxml.etree
文档 https://lxml.de/tutorial.html#parsing-from-strings-and-files 并找到了 .parse
但这只是在文件上。查看 Response 的方法,我可以看到我可以获得有关 headers 的信息,例如内容类型,尽管文档继续使用 json、
Response 中是否有一些方法可以只提取 xml 部分而不包括 headers,或者 lxml.etree 中是否有一种方法?
你可以这样处理:
theResponse = [your response above]
from lxml import etree
from io import StringIO
parser = etree.HTMLParser()
tree = etree.parse(StringIO(theResponse), parser)
从这一点开始,lxml就可以搞定了。举一个随机的例子,如果你在响应中的链接之后,你可以尝试:
for i in tree.iter():
if len(i.values())>0:
print(i.values()[0])
输出将是:
http://schemas.xmlsoap.org/soap/envelope/
http://www.strongmail.com/services/v2/schema
http://www.w3.org/2001/XMLSchema-instance
等等
我收到来自 SOAP API 的响应,其中包含 Content-Type:application/xop+xml。我不确定 Response.text 让 lxml.etree.fromstring
让 xml 使用的效率如何。
这里是Response.text
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94
Content-Type: application/xop+xml; charset=UTF-8; type="text/xml"
Content-Transfer-Encoding: binary
Content-ID: <root.message@cxf.apache.org>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><listResponse xmlns="http://www.strongmail.com/services/v2/schema"><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>101</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>102</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>103</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>107</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>108</id></objectId><objectId xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="UserId"><id>109</id></objectId></listResponse></soap:Body></soap:Envelope>
--uuid:051145c9-9210-4e26-a390-d7cdd06b9f94--
获取 .text 并etree.fromstring解析它
from lxml import etree
resXML = etree.fromstring(theResponse.text)
给出以下内容:
resXML = etree.fromstring(theResponse.text)
File "src/lxml/etree.pyx", line 3222, in lxml.etree.fromstring
File "src/lxml/parser.pxi", line 1877, in lxml.etree._parseMemoryDocument
File "src/lxml/parser.pxi", line 1758, in lxml.etree._parseDoc
File "src/lxml/parser.pxi", line 1068, in lxml.etree._BaseParser._parseUnicodeDoc
File "src/lxml/parser.pxi", line 601, in lxml.etree._ParserContext._handleParseResultDoc
File "src/lxml/parser.pxi", line 711, in lxml.etree._handleParseResult
File "src/lxml/parser.pxi", line 640, in lxml.etree._raiseParseError
File "<string>", line 1
lxml.etree.XMLSyntaxError: Start tag expected, '<' not found, line 1, column 1
我相信这是因为它期望“<”作为所有 xml 开头的第一件事。
我查看了 lxml.etree
文档 https://lxml.de/tutorial.html#parsing-from-strings-and-files 并找到了 .parse
但这只是在文件上。查看 Response 的方法,我可以看到我可以获得有关 headers 的信息,例如内容类型,尽管文档继续使用 json、
Response 中是否有一些方法可以只提取 xml 部分而不包括 headers,或者 lxml.etree 中是否有一种方法?
你可以这样处理:
theResponse = [your response above]
from lxml import etree
from io import StringIO
parser = etree.HTMLParser()
tree = etree.parse(StringIO(theResponse), parser)
从这一点开始,lxml就可以搞定了。举一个随机的例子,如果你在响应中的链接之后,你可以尝试:
for i in tree.iter():
if len(i.values())>0:
print(i.values()[0])
输出将是:
http://schemas.xmlsoap.org/soap/envelope/
http://www.strongmail.com/services/v2/schema
http://www.w3.org/2001/XMLSchema-instance
等等