Scrapy 查询返回一个空列表

Question

我想将网站抓取到 link。 https://www.rentomojo.com/mumbai/furniture/bedroom-furniture-on-rent

link 是 div 里面的 href link。我的 scrapy 代码是

response.css("div.col-xs-6 col-sm-4 col-mgbtm a::attr(href)").extract()

但这不起作用。

我什至尝试过使用 xpath

response.xpath("//div[@class='col-xs-6 col-sm-4 col-mgbtm']/a/@href").extract()

但这也行不通。

如有任何帮助，我们将不胜感激。

Answer 1

我在 bash:

中尝试了你的 xpath 代码

scrapy shell https://www.rentomojo.com/mumbai/furniture/bedroom-furniture-on-rent
response.xpath("//div[@class='col-xs-6 col-sm-4 col-mgbtm']/a/@href").extract()

而且效果很好。

代码 css

response.css("div.col-xs-6 col-sm-4 col-mgbtm a::attr(href)").extract()

return 没有。

Answer 2

直接写response.css(".col-xs-6 ::attr(href)").extract()

当存在 class 或 id 时，无需在选择器中写入 HTML 标记。同样不需要写a标签::attr(href)就可以提取链接了。

Scrapy 查询返回一个空列表

Scrapy query is returning an empty list

html

screen-scraping

web-crawler

scrapy

web