从文件下载创建 xsd 文档
Creating xsd document from file download
我正在尝试加载存储在 s3 上的 xsd 文档。它给了我以下错误
>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node=etree.fromstring(requests.get(xsd_url).text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 3092, in lxml.etree.fromstring (src/lxml/lxml.etree.c:70473)
File "parser.pxi", line 1823, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106272)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
我确认该文件确实正确并且可以在本地加载。我如何从 s3 加载它?
您可以使用 urllib2
并尝试这样做:
xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
xsd_contents = urllib2.urlopen(xsd_url).read()
xmlschema_doc = etree.fromstring(xsd_contents)
使用 .content
类型 bytes
>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node = etree.fromstring(requests.get(xsd_url).content))
问题是您的 xml 文件指定了一种编码,因此 xml 解析器的工作是解码该编码。但是您的代码使用 .text
,它要求 requests
解码编码。
这是正确的做法,但是 XML 解析器不喜欢被提供一个已经解码的东西,然后被告知如何解码它,所以抛出你看到的异常。修复?没有 requests
解码它。
我正在尝试加载存储在 s3 上的 xsd 文档。它给了我以下错误
>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node=etree.fromstring(requests.get(xsd_url).text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 3092, in lxml.etree.fromstring (src/lxml/lxml.etree.c:70473)
File "parser.pxi", line 1823, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:106272)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
我确认该文件确实正确并且可以在本地加载。我如何从 s3 加载它?
您可以使用 urllib2
并尝试这样做:
xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
xsd_contents = urllib2.urlopen(xsd_url).read()
xmlschema_doc = etree.fromstring(xsd_contents)
使用 .content
类型 bytes
>>> from lxml import etree
>>> xsd_url = 'https://s3-us-west-1.amazonaws.com/premiere-avails/movie.xsd.xml'
>>> node = etree.fromstring(requests.get(xsd_url).content))
问题是您的 xml 文件指定了一种编码,因此 xml 解析器的工作是解码该编码。但是您的代码使用 .text
,它要求 requests
解码编码。
这是正确的做法,但是 XML 解析器不喜欢被提供一个已经解码的东西,然后被告知如何解码它,所以抛出你看到的异常。修复?没有 requests
解码它。