Python 和 XML 错误
Error with Python and XML
我在尝试从 XML 中获取值时遇到错误。我得到 "Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration."
这是我的代码:
import requests
import lxml.etree
from requests.auth import HTTPBasicAuth
r= requests.get("https://somelinkhere/folder/?parameter=abc", auth=HTTPBasicAuth('username', 'password'))
print r.text
root = lxml.etree.fromstring(r.text)
textelem = root.find("opensearch:totalResults")
print textelem.text
我收到这个错误:
Traceback (most recent call last):
File "tickets2.py", line 8, in <module>
root = lxml.etree.fromstring(r.text)
File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
File "src/lxml/parser.pxi", line 1814, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124471)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
这是 XML 的样子,我试图在最后一行抓取文件。
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:apple-wallpapers="http://www.apple.com/ilife/wallpapers" xmlns:g-custom="http://base.google.com/cns/1.0" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:cc="http://web.resource.org/cc/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:g-core="http://base.google.com/ns/1.0">
<title>Feed from some link here</title>
<link rel="self" href="https://somelinkhere/folder/?parameter=abc" />
<link rel="first" href="https://somelinkhere/folder/?parameter=abc" />
<id>https://somelinkhere/folder/?parameter=abc</id>
<updated>2018-03-06T17:48:09Z</updated>
<dc:creator>company.com</dc:creator>
<dc:date>2018-03-06T17:48:09Z</dc:date>
<opensearch:totalResults>4</opensearch:totalResults>
我已尝试通过 https://twigstechtips.blogspot.com/2013/06/python-lxml-strings-with-encoding.html and http://makble.com/how-to-parse-xml-with-python-and-lxml 等链接进行各种更改,但我仍然 运行 陷入同样的错误。
尝试使用 r.content
来代替 r.text
,后者猜测文本编码并对其进行解码,它以字节形式访问响应主体。 (参见 http://docs.python-requests.org/en/latest/user/quickstart/#response-content。)
您也可以使用 r.raw
。有关详细信息,请参阅 parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml)。
解决该问题后,您将遇到命名空间问题。您要查找的元素 (opensearch:totalResults
) 的前缀 opensearch
已绑定到 uri http://a9.com/-/spec/opensearch/1.1/
.
您可以通过组合命名空间 uri 和本地名称(Clark 表示法)来查找元素:
{http://a9.com/-/spec/opensearch/1.1/}totalResults
有关详细信息,请参阅 http://lxml.de/tutorial.html#namespaces。
这是一个实施了两项更改的示例:
os = "{http://a9.com/-/spec/opensearch/1.1/}"
root = lxml.etree.fromstring(r.content)
textelem = root.find(os + "totalResults")
print textelem.text
我在尝试从 XML 中获取值时遇到错误。我得到 "Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration."
这是我的代码:
import requests
import lxml.etree
from requests.auth import HTTPBasicAuth
r= requests.get("https://somelinkhere/folder/?parameter=abc", auth=HTTPBasicAuth('username', 'password'))
print r.text
root = lxml.etree.fromstring(r.text)
textelem = root.find("opensearch:totalResults")
print textelem.text
我收到这个错误:
Traceback (most recent call last):
File "tickets2.py", line 8, in <module>
root = lxml.etree.fromstring(r.text)
File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
File "src/lxml/parser.pxi", line 1814, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124471)
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
这是 XML 的样子,我试图在最后一行抓取文件。
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:apple-wallpapers="http://www.apple.com/ilife/wallpapers" xmlns:g-custom="http://base.google.com/cns/1.0" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:cc="http://web.resource.org/cc/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:g-core="http://base.google.com/ns/1.0">
<title>Feed from some link here</title>
<link rel="self" href="https://somelinkhere/folder/?parameter=abc" />
<link rel="first" href="https://somelinkhere/folder/?parameter=abc" />
<id>https://somelinkhere/folder/?parameter=abc</id>
<updated>2018-03-06T17:48:09Z</updated>
<dc:creator>company.com</dc:creator>
<dc:date>2018-03-06T17:48:09Z</dc:date>
<opensearch:totalResults>4</opensearch:totalResults>
我已尝试通过 https://twigstechtips.blogspot.com/2013/06/python-lxml-strings-with-encoding.html and http://makble.com/how-to-parse-xml-with-python-and-lxml 等链接进行各种更改,但我仍然 运行 陷入同样的错误。
尝试使用 r.content
来代替 r.text
,后者猜测文本编码并对其进行解码,它以字节形式访问响应主体。 (参见 http://docs.python-requests.org/en/latest/user/quickstart/#response-content。)
您也可以使用 r.raw
。有关详细信息,请参阅 parsing XML file gets UnicodeEncodeError (ElementTree) / ValueError (lxml)。
解决该问题后,您将遇到命名空间问题。您要查找的元素 (opensearch:totalResults
) 的前缀 opensearch
已绑定到 uri http://a9.com/-/spec/opensearch/1.1/
.
您可以通过组合命名空间 uri 和本地名称(Clark 表示法)来查找元素:
{http://a9.com/-/spec/opensearch/1.1/}totalResults
有关详细信息,请参阅 http://lxml.de/tutorial.html#namespaces。
这是一个实施了两项更改的示例:
os = "{http://a9.com/-/spec/opensearch/1.1/}"
root = lxml.etree.fromstring(r.content)
textelem = root.find(os + "totalResults")
print textelem.text