如何在 Scrapy 中获取 Scrapy Response Shell

Question

我是Scrapy项目的新手，看过很多教程。他们似乎都假定您知道如何立即解析响应对象，并且您可以立即调用响应回调（例如 scrapy.Request(url=url, callback=self.parse)）。但实际上，我需要通过反复试验来完成 CSS 选择器。

如何取回 Scrapy 响应对象，以便我可以交互地操作它？

scrapy.Request(url=url) 的响应似乎是我无法使用的东西（见屏幕截图）- 没有什么可以解析 HTML 的 - 例如没有 .css 方法。

Answer 1

scrapy 异步工作，在这种情况下，请求的 callback 方法用于确定哪个 "method" 将接收 response 对象。

因此，如果您真的学习了很多教程，您将会得到类似的东西：

def parse(self, response):
    ...

    yield Request(url='myurl', callback=self.some_other_method)

def some_other_method(self, response):
    ...
    # play with the response object of the request done to `myurl`

这样，您将按顺序获得响应。

Answer 2

在shell中，使用fetch(request)

fetch(url[, redirect=True]) - 从给定的 URL 获取新响应并相应地更新所有相关对象。您可以选择要求不通过传递 redirect=False
fetch(request) - 从给定的请求中获取新的响应并相应地更新所有相关对象。

如何在 Scrapy 中获取 Scrapy Response Shell

How to get Scrapy Response in Scrapy Shell

python

ipython

scrapy

python-2.7

scrapy-spider