如何用Scrapy获取每个爬虫的深度
How to get depth of each crawler with Scrapy
有没有办法跟踪每个爬虫的深度?
我正在递归抓取一些网站。
我的设置类似于下面的代码。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
next_crawl_depth = response.meta['depth'] + 1
if(condition is satisfied):
with open(filename, "a") as file:
file.write(record depth and url)
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
这种方法行不通。
例如,我想记录每个爬虫的 activity
crawler depth1 URL1
crawler depth2 URL2
...
提前致谢。
我想你快到了。请尝试此代码。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
cur_crawl_depth = response.meta['depth']
next_crawl_depth = cur_crawl_depth + 1
if(condition is satisfied):
with open(filename, "w+") as f:
f.write(url + str(cur_crawl_depth) + "\n")
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
有没有办法跟踪每个爬虫的深度?
我正在递归抓取一些网站。
我的设置类似于下面的代码。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
next_crawl_depth = response.meta['depth'] + 1
if(condition is satisfied):
with open(filename, "a") as file:
file.write(record depth and url)
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
这种方法行不通。
例如,我想记录每个爬虫的 activity
crawler depth1 URL1
crawler depth2 URL2
...
提前致谢。
我想你快到了。请尝试此代码。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
cur_crawl_depth = response.meta['depth']
next_crawl_depth = cur_crawl_depth + 1
if(condition is satisfied):
with open(filename, "w+") as f:
f.write(url + str(cur_crawl_depth) + "\n")
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})