如何从 Scrapy 响应中提取值

Question

我正在使用 Scrapy 开发一个项目，我有 html 文件的内容。我想提取标题值，例如“ELK 设置用于创建 SIEM Solution_Upwork 请求”。

<a href="https://discuss.elastic.co/t/elk-set-up-for-creating-a-siem-solution-upwork-request/286299" class="title raw-link raw-topic-link">ELK set up for creating a SIEM Solution_Upwork Request</a>

我正在使用以下方式接收网页上的所有标题：

result = response.xpath('''//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]''').extract()

打印结果：

[<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
 <Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
 <Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
 <Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>, 
...

我试过了

result.xpath("""//[@id="raw-topic-link"]/text()""").extract()

但我得到一个空列表或无效表达式错误。知道如何解决这个问题吗？是否有任何有用的在线资源来了解有关如何从 div、类、链接等中提取值的所有不同变体的更多信息？

Answer 1

您可以尝试如下操作：

response.xpath('//a[@class="title raw-link raw-topic-link"]/text()')# .get() or.getall()

如何从 Scrapy 响应中提取值

How to extract values from Scrapy response

html

python

xpath

scrapy