将 post 项通过管道传输到存储服务中
Pipeline to post item into storage service
我想要一个管道将 POST 项目异步到存储服务。我想到为此使用 FilePipeline 之类的东西。 FilePipeline 带来了很多开销,因为它假定我想将文件保存到磁盘,但在这里我只想 post 将文件存储到 API。但是,它确实有一个产生请求的方法:get_media_requests()
。
我目前 FileException
失败,我不知道如何删除保存到磁盘的组件。有没有办法使这项工作很好?
class StoragePipeline(FilePipeline):
access_token = os.environ['access_token']
def get_media_requests(self, item, info):
filename = item['filename']
headers = {
'Authorization': f'Bearer {self.access_token}',
'Dropbox-API-Arg': f'{{"path": "/{filename}"}}',
'Content-Type': 'application/octet-stream',
}
request = Request(
method='POST',
url='https://content.dropboxapi.com/2/files/upload',
headers=headers,
body=item['data'],
)
yield request
def item_completed(self, results, item, info):
return item
您可以通过公开爬虫并直接安排您的请求来安排管道中的 scrapy 请求:
class MyPipeline(object):
def __init__(self, crawler):
self.crawler = crawler
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_item(self, item, spider):
if item['some_extra_field']: # check if we already did below
return item
req = scrapy.Request('some_url', self.check_deploy,
method='POST', meta={'item': item})
self.crawler.engine.crawl(req, spider)
return item
def check_deploy(self, response):
# if not 200 we might want to retry
if response.status != 200:
return response.meta['item']
我想要一个管道将 POST 项目异步到存储服务。我想到为此使用 FilePipeline 之类的东西。 FilePipeline 带来了很多开销,因为它假定我想将文件保存到磁盘,但在这里我只想 post 将文件存储到 API。但是,它确实有一个产生请求的方法:get_media_requests()
。
我目前 FileException
失败,我不知道如何删除保存到磁盘的组件。有没有办法使这项工作很好?
class StoragePipeline(FilePipeline):
access_token = os.environ['access_token']
def get_media_requests(self, item, info):
filename = item['filename']
headers = {
'Authorization': f'Bearer {self.access_token}',
'Dropbox-API-Arg': f'{{"path": "/{filename}"}}',
'Content-Type': 'application/octet-stream',
}
request = Request(
method='POST',
url='https://content.dropboxapi.com/2/files/upload',
headers=headers,
body=item['data'],
)
yield request
def item_completed(self, results, item, info):
return item
您可以通过公开爬虫并直接安排您的请求来安排管道中的 scrapy 请求:
class MyPipeline(object):
def __init__(self, crawler):
self.crawler = crawler
@classmethod
def from_crawler(cls, crawler):
return cls(crawler)
def process_item(self, item, spider):
if item['some_extra_field']: # check if we already did below
return item
req = scrapy.Request('some_url', self.check_deploy,
method='POST', meta={'item': item})
self.crawler.engine.crawl(req, spider)
return item
def check_deploy(self, response):
# if not 200 we might want to retry
if response.status != 200:
return response.meta['item']