我如何将旋转代理与 scrapy 一起使用?
how can i use rotated proxy with scrapy?
我在 pip install scrapy-rotating-proxies
之后的 setting.py 中写了这个
ROTATING_PROXY_LIST = ['http://209.50.52.162:9050']
DOWNLOADER_MIDDLEWARES = {
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620
}
然后如果我 运行 像这样的蜘蛛 scrapy crawl test
它显示了这个。
2021-05-03 15:03:32 [rotating_proxies.middlewares] WARNING: No proxies available; marking all proxies as unchecked
2021-05-03 15:03:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-05-03 15:03:50 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 0, unchecked: 0, reanimated: 1, mean backoff time: 0s)
2021-05-03 15:03:53 [rotating_proxies.expire] DEBUG: Proxy <http://209.50.52.162:9050> is DEAD
2021-05-03 15:03:53 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://www.google.com> with another proxy (failed 3 times, max retries: 5)
我该如何解决这个问题?
注意日志中的消息:
DEBUG: Proxy <http://209.50.52.162:9050> is DEAD
您需要添加更多代理,如 documentation 所示:
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
# ...
]
您可以从许多 sites.
中获取代理列表
我在 pip install scrapy-rotating-proxies
ROTATING_PROXY_LIST = ['http://209.50.52.162:9050']
DOWNLOADER_MIDDLEWARES = {
'rotating_proxies.middlewares.RotatingProxyMiddleware': 610,
'rotating_proxies.middlewares.BanDetectionMiddleware': 620
}
然后如果我 运行 像这样的蜘蛛 scrapy crawl test
它显示了这个。
2021-05-03 15:03:32 [rotating_proxies.middlewares] WARNING: No proxies available; marking all proxies as unchecked
2021-05-03 15:03:50 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-05-03 15:03:50 [rotating_proxies.middlewares] INFO: Proxies(good: 0, dead: 0, unchecked: 0, reanimated: 1, mean backoff time: 0s)
2021-05-03 15:03:53 [rotating_proxies.expire] DEBUG: Proxy <http://209.50.52.162:9050> is DEAD
2021-05-03 15:03:53 [rotating_proxies.middlewares] DEBUG: Retrying <GET https://www.google.com> with another proxy (failed 3 times, max retries: 5)
我该如何解决这个问题?
注意日志中的消息:
DEBUG: Proxy <http://209.50.52.162:9050> is DEAD
您需要添加更多代理,如 documentation 所示:
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
# ...
]
您可以从许多 sites.
中获取代理列表