有没有办法重新启动 scrapy 爬虫？

Question

我想知道是否有办法重新启动 scrapy 爬虫。这是我的代码的样子：

from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from scrapy.crawler import CrawlerProcess

results = set([])

class SitemapCrawler(CrawlSpider):

name = "Crawler"
start_urls = ['www.example.com']
allowed_domains = ['www.example.com']
rules = [Rule(LinkExtractor(), callback='parse_links', follow=True)]

def parse_links(self, response):
    href = response.xpath('//a/@href').getall()
    results.add(response.url)
    for link in href:
        results.add(link)

def start():
   process.crawl(Crawler)
   process.start()
   for link in results:
      print(link)

如果我尝试调用 start() 两次，它会运行一次而不是给我这个错误：

raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable

我知道这是一个一般性问题，所以我不希望有任何代码，但我只想知道如何解决这个问题。提前致谢。

Answer 1

from twisted.internet import reactor
import scrapy
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
class MySpider(scrapy.Spider):
        #Spider definition
        configure_logging({'LOG_FORMAT': '%(levelname)s: %(message)s'})
        runner = CrawlerRunner()
        d = runner.crawl(MySpider)
        def finished():            
            print("finished :D") 
        d.addCallback(finished)
        reactor.run()

有没有办法重新启动 scrapy 爬虫？

Is there a way to restart a scrapy crawler?

python

scrapy