Scrapy中"set_crawler"和"from_crawler"在'crawl.py'中的作用是什么？

Question

我看不懂那些函数。如果我继承了 Spider 或 CrawlSpider，我是否应该重写这些函数。如果不是，那为什么？

@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    spider = super(CrawlSpider, cls).from_crawler(crawler, *args, **kwargs)
    spider._follow_links = crawler.settings.getbool(
                                   'CRAWLSPIDER_FOLLOW_LINKS', True)
    return spider

def set_crawler(self, crawler):
    super(CrawlSpider, self).set_crawler(crawler)
    self._follow_links = crawler.settings.getbool(
                                 'CRAWLSPIDER_FOLLOW_LINKS', True)

Answer 1

通常你不需要覆盖这些功能，但这取决于你想做什么。

from_crawler 方法（带有 @classmethod 装饰器）是一个 factory method ，Scrapy 将使用它来实例化您所在位置的对象（蜘蛛、扩展、中间件等）添加它。

它通常用于获取对 crawler 对象的引用（包含对 settings、stats 等对象的引用），然后将其作为参数传递给正在创建对象或为其设置属性。

在您粘贴的特定示例中，它用于从 CRAWLSPIDER_FOLLOW_LINKS 设置中读取值并将其设置为蜘蛛中的 _follow_links 属性。

你可以看到another simple example of usage of the from_crawler method in this extension that uses the crawler object to get the value of a setting and passing it as parameter to the extension and to connect some signals一些方法。

set_crawler 方法在最新的 Scrapy 版本中已被弃用，应避免使用。

Scrapy中"set_crawler"和"from_crawler"在'crawl.py'中的作用是什么？

What is the function of "set_crawler" and "from_crawler" in 'crawl.py' in Scrapy?

python

scrapy

阅读更多：