Scrapy、Scrapinghub 和 Google 云存储:Keyerror 'gs' 而 运行 scrapinghub 上的蜘蛛
Scrapy, Scrapinghub and Google Cloud Storage: Keyerror 'gs' while running the spider on scrapinghub
我正在使用 Python 3 开发一个 scrapy 项目,并将蜘蛛部署到 scrapinghub。我还使用 Google 云存储来存储官方文档 here 中提到的抓取文件。
当我在本地 运行 时,蜘蛛 运行 绝对没问题,而且蜘蛛正在毫无错误地部署到 scrapinghub。我正在使用 scrapy:1.4-py3 作为 scrapinghub 的堆栈。当 运行 上面有蜘蛛时,我收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/media.py", line 68, in from_crawler
pipe = cls.from_settings(crawler.settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 95, in from_settings
return cls(store_uri, settings=settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 52, in __init__
download_func=download_func)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 234, in __init__
self.store = self._get_store(store_uri)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 269, in _get_store
store_cls = self.STORE_SCHEMES[scheme]
KeyError: 'gs'
PS:'gs'用于路径中存放类似
的文件
'IMAGES_STORE':'gs://<bucket-name>/'
我已经研究过这个错误,但没有任何解决方案。任何帮助都将是巨大的帮助。
Google Cloud Storage支持是Scrapy 1.5的新特性,所以需要在Scrapy Cloud中使用scrapy:1.5-py3
stack。
我正在使用 Python 3 开发一个 scrapy 项目,并将蜘蛛部署到 scrapinghub。我还使用 Google 云存储来存储官方文档 here 中提到的抓取文件。
当我在本地 运行 时,蜘蛛 运行 绝对没问题,而且蜘蛛正在毫无错误地部署到 scrapinghub。我正在使用 scrapy:1.4-py3 作为 scrapinghub 的堆栈。当 运行 上面有蜘蛛时,我收到以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/media.py", line 68, in from_crawler
pipe = cls.from_settings(crawler.settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 95, in from_settings
return cls(store_uri, settings=settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 52, in __init__
download_func=download_func)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 234, in __init__
self.store = self._get_store(store_uri)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 269, in _get_store
store_cls = self.STORE_SCHEMES[scheme]
KeyError: 'gs'
PS:'gs'用于路径中存放类似
的文件'IMAGES_STORE':'gs://<bucket-name>/'
我已经研究过这个错误,但没有任何解决方案。任何帮助都将是巨大的帮助。
Google Cloud Storage支持是Scrapy 1.5的新特性,所以需要在Scrapy Cloud中使用scrapy:1.5-py3
stack。