如何用Scrapy获取每个爬虫的深度

Question

有没有办法跟踪每个爬虫的深度？

我正在递归抓取一些网站。

我的设置类似于下面的代码。

import scrapy

class Crawl(scrapy.Spider):
    name = "Crawl"

    def start_requests(self):
        if(condition is satisfied):
            yield scrapy.Request(url=url, 
                                 callback=self.parse,
                                 meta={'depth':1})

    def parse(self, response):
        next_crawl_depth = response.meta['depth'] + 1
        if(condition is satisfied):
            with open(filename, "a") as file:
                file.write(record depth and url)
            yield scrapy.Request(url=url,
                                 callback=self.parse,
                                 meta={'depth': next_crawl_depth})

这种方法行不通。

例如，我想记录每个爬虫的 activity

crawler depth1 URL1
crawler depth2 URL2
...

提前致谢。

Answer 1

我想你快到了。请尝试此代码。

import scrapy

class Crawl(scrapy.Spider):
name = "Crawl"

def start_requests(self):
    if(condition is satisfied):
        yield scrapy.Request(url=url, 
                             callback=self.parse,
                             meta={'depth':1})

def parse(self, response):
    cur_crawl_depth = response.meta['depth']
    next_crawl_depth = cur_crawl_depth + 1
    if(condition is satisfied):
        with open(filename, "w+") as f:
            f.write(url + str(cur_crawl_depth) + "\n")
        yield scrapy.Request(url=url,
                             callback=self.parse,
                             meta={'depth': next_crawl_depth})

如何用Scrapy获取每个爬虫的深度

How to get depth of each crawler with Scrapy

web-crawler

scrapy

web-scraping