如何将 Scrapy XPath 与 XML 命名空间一起使用?
How to use Scrapy XPath with XML namespaces?
如何使用 scrapy XPath 从 RSS feed(下面的示例)中提取 <content:encoded> ... </content:encoded>
内容?
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Latest – Reason.com</title>
<item>
<pubDate>Thu, 16 Jan 2020 21:40:23 +0000</pubDate>
<content:encoded><![CDATA[<p><span style="font-weight: 400">
Jimmy Meders was scheduled to die by lethal injection today,
but the Georgia parole board has granted him clemency.</span></p>]]>
</content:encoded>
...
我试过response.xpath('//content:encoded').get()
,但没用。
非常感谢任何帮助。
您必须声明并注册一个 XML 命名空间前缀:
response.selector.register_namespace('content',
'http://purl.org/rss/1.0/modules/content/')
response.xpath('//content:encoded').getall()
如何使用 scrapy XPath 从 RSS feed(下面的示例)中提取 <content:encoded> ... </content:encoded>
内容?
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
<title>Latest – Reason.com</title>
<item>
<pubDate>Thu, 16 Jan 2020 21:40:23 +0000</pubDate>
<content:encoded><![CDATA[<p><span style="font-weight: 400">
Jimmy Meders was scheduled to die by lethal injection today,
but the Georgia parole board has granted him clemency.</span></p>]]>
</content:encoded>
...
我试过response.xpath('//content:encoded').get()
,但没用。
非常感谢任何帮助。
您必须声明并注册一个 XML 命名空间前缀:
response.selector.register_namespace('content',
'http://purl.org/rss/1.0/modules/content/')
response.xpath('//content:encoded').getall()