在 Xpath 中传递 Cyrilics 包含 returns XML 值错误。垃圾。 Python 2

Question

我正在尝试通过 xpath Text contains 获取这样的元素。

<p><strong>Полное наименование</strong></p>

结果我收到了这个错误。

    In [4]: response.xpath("//p[contains(text(),'Полное')]").extract()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-7e122465e645> in <module>()
----> 1 response.xpath("//p[contains(text(),'Полное')]").extract()

c:\python27\lib\site-packages\scrapy\http\response\text.pyc in xpath(self, query, **kwargs)
    117
    118     def xpath(self, query, **kwargs):
--> 119         return self.selector.xpath(query, **kwargs)
    120
    121     def css(self, query):

c:\python27\lib\site-packages\parsel\selector.pyc in xpath(self, query, namespaces, **kwargs)
    226             result = xpathev(query, namespaces=nsp,
    227                              smart_strings=self._lxml_smart_strings,
--> 228                              **kwargs)
    229         except etree.XPathError as exc:
    230             msg = u"XPath error: %s in %s" % (exc, query)

src\lxml\etree.pyx in lxml.etree._Element.xpath()

src\lxml\xpath.pxi in lxml.etree.XPathElementEvaluator.__call__()

src\lxml\apihelpers.pxi in lxml.etree._utf8()

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

这是我的 xpath

response.xpath("//p[contains(text(),'Полное')]").extract()

“Полное”是我用于搜索的俄语文本。我该如何修复错误？

Answer 1

用 u 为您的表达式字符串添加前缀以生成 unicode 字符串：

response.xpath(u"//p[contains(text(),'Полное')]").extract()

在 Xpath 中传递 Cyrilics 包含 returns XML 值错误。垃圾。 Python 2

Passing Cyrilics in Xpath Contains returns XML value error. Scrapy. Python 2

xpath

scrapy

cyrillic