如何停止 scrapy 爬虫

Question

如果满足某些条件，我想停止蜘蛛我试过这样做： raise CloseSpider('Some Text') 和

sys.exit("SHUT DOWN EVERYTHING!")

但它并没有停止。这是代码编写引发异常而不是 return 也不会随着蜘蛛继续爬行而工作：

import scrapy
from scrapy.http import Request

from tutorial.items import DmozItem
from scrapy.exceptions import CloseSpider
import sys

class DmozSpider(scrapy.Spider):
    name = "tutorial"
    allowed_domain = ["jabong.com"]
    start_urls = [
            "http://www.jabong.com/women/shoes/sandals/?page=1"
        ]

    page_index = 1

    def parse(self,response):
        products = response.xpath('//li')

        if products:
            for product in products:
                item = DmozItem()
                item_url = product.xpath('@data-url').extract()
                item_url = "http://www.jabong.com/" + item_url[0] if item_url else ''   
                if item_url:
                        request=Request(url=item_url,callback=self.parse_page2,meta={"item":item},
                                headers={"Accept":
                        "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"})
                    request.meta['item'] = item
                    yield request
        else:
            return

        self.page_index += 1
            if self.page_index:
                    yield Request(url="http://www.jabong.com/women/shoes/sandals/?page=%s" %                        (self.page_index),
            headers={"Referer": "http://www.jabong.com/women/shoes/sandals/",
                 "X-Requested-With": "XMLHttpRequest"},
                        callback=self.parse)

    def parse_page2(self, response):
        sizes=[]
        item = response.meta['item']
        item['site_name'] = 'jabong'
        item['tags'] = ''
        yield item

更新：而不是 return 即使我养了 closspider 我的蜘蛛也没有停止

Answer 1

return 也可以在这里工作，不是强制的，而是由于爬行逻辑 - 因为您不再产生任何请求。
但是请记住，您可能解释为 "the spider doesn't close" 的实际上是管道中已开始处理的剩余请求，并且需要更多时间才能完成处理。因此，蜘蛛不会在执行 return 的同一瞬间停止，因为管道中仍有请求。当它们都被处理完，如果没有新的创建，蜘蛛最终会停止。

如何停止 scrapy 爬虫

How to stop scrapy crawler

python

scrapy