如何使用scrapy-redis的例子

Question

看过scrapy-redis的example，但还是不太明白怎么用

我有运行名为 dmoz 的蜘蛛，它运行良好。但是当我启动另一个名为 mycrawler_redis 的蜘蛛时，它什么也没有。

此外，我对如何设置请求队列感到很困惑。我没有在示例项目中找到任何说明请求队列设置的代码。

而如果不同机器上的爬虫想要共享同一个请求队列，如何实现呢？看来我应该先让从机连接到主机的redis，但我不确定将相关代码放在spider.py中的哪一部分，或者我只是在命令行中输入它？

我对 scrapy-redis 很陌生，如有任何帮助，我们将不胜感激！

Answer 1

如果示例蜘蛛正常工作而您的自定义蜘蛛不工作，则一定是您做错了什么。使用代码更新您的问题，包括所有相关部分，以便我们了解问题出在哪里。

Besides I'm quite confused about how the request queue is set. I didn't find any piece of code in the example-project which illustrate the request queue setting.

就您的蜘蛛而言，这是通过适当的项目设置完成的，例如，如果您想要 FIFO：

# Enables scheduling storing requests queue in redis.
SCHEDULER = "scrapy_redis.scheduler.Scheduler"

# Don't cleanup redis queues, allows to pause/resume crawls.
SCHEDULER_PERSIST = True

# Schedule requests using a queue (FIFO).
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.SpiderQueue'

就实施而言，排队是通过 RedisSpider 完成的，您必须从您的蜘蛛继承。您可以在此处找到排队请求的代码：https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/scheduler.py#L73

关于连接，不需要手动连接redis机器，在设置中指定主机和端口信息即可：

REDIS_HOST = 'localhost'
REDIS_PORT = 6379

并且连接配置在ċonnection.py: https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/connection.py 用法示例可以在几个地方找到：https://github.com/darkrho/scrapy-redis/blob/a295b1854e3c3d1fddcd02ffd89ff30a6bea776f/scrapy_redis/pipelines.py#L17

如何使用scrapy-redis的例子

how to use the example of scrapy-redis

scrapy

web-scraping

scrapy-spider