如何从 Scrapy 响应中提取值
How to extract values from Scrapy response
我正在使用 Scrapy 开发一个项目,我有 html 文件的内容。
我想提取标题值,例如“ELK 设置用于创建 SIEM Solution_Upwork 请求”。
<a href="https://discuss.elastic.co/t/elk-set-up-for-creating-a-siem-solution-upwork-request/286299" class="title raw-link raw-topic-link">ELK set up for creating a SIEM Solution_Upwork Request</a>
我正在使用以下方式接收网页上的所有标题:
result = response.xpath('''//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]''').extract()
打印结果:
[<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
...
我试过了
result.xpath("""//[@id="raw-topic-link"]/text()""").extract()
但我得到一个空列表或无效表达式错误。
知道如何解决这个问题吗?是否有任何有用的在线资源来了解有关如何从 div、类、链接等中提取值的所有不同变体的更多信息?
您可以尝试如下操作:
response.xpath('//a[@class="title raw-link raw-topic-link"]/text()')# .get() or.getall()
我正在使用 Scrapy 开发一个项目,我有 html 文件的内容。 我想提取标题值,例如“ELK 设置用于创建 SIEM Solution_Upwork 请求”。
<a href="https://discuss.elastic.co/t/elk-set-up-for-creating-a-siem-solution-upwork-request/286299" class="title raw-link raw-topic-link">ELK set up for creating a SIEM Solution_Upwork Request</a>
我正在使用以下方式接收网页上的所有标题:
result = response.xpath('''//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]''').extract()
打印结果:
[<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
<Selector xpath='//*[contains(concat( " ", @class, " " ), concat( " ", "raw-topic-link", " " ))]' data='<a href="https://discuss.elastic.co/t...'>,
...
我试过了
result.xpath("""//[@id="raw-topic-link"]/text()""").extract()
但我得到一个空列表或无效表达式错误。 知道如何解决这个问题吗?是否有任何有用的在线资源来了解有关如何从 div、类、链接等中提取值的所有不同变体的更多信息?
您可以尝试如下操作:
response.xpath('//a[@class="title raw-link raw-topic-link"]/text()')# .get() or.getall()