如何在不使用 python 写入文件的情况下将文件分块传输到 Azure Blob 存储
how to transfer file to azure blob storage in chunks without writing to file using python
我需要将文件从 google 云存储传输到 Azure Blob 存储。
Google 给出了将文件下载到字节变量的代码片段,如下所示:
# Get Payload Data
req = client.objects().get_media(
bucket=bucket_name,
object=object_name,
generation=generation) # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print 'Download %d%%.' % int(status.progress() * 100)
print 'Download Complete!'
print fh.getvalue()
我可以通过像这样更改 fh 对象类型来修改它以存储到文件中:
fh = open(object_name, 'wb')
然后我可以使用 blob_service.put_block_blob_from_path
.
上传到 azure blob 存储
我想避免在进行传输的机器上写入本地文件。
我收集了 Google 的代码片段,一次将数据块加载到 io.BytesIO() 对象中。我想我可能应该使用它来一次写入一个块的 blob 存储。
我尝试将整个内容读入内存,然后使用 put_block_blob_from_bytes
上传,但出现内存错误(文件可能太大 (~600MB)。
有什么建议吗?
查看了 SDK 源代码后,类似这样的方法可行:
from azure.storage.blob import _chunking
from azure.storage.blob import BlobService
# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)
def process_chunk(self, chunk_offset, chunk_data):
'''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
return self._upload_chunk_with_retries(chunk_offset, chunk_data)
blob_service = BlobService(account_name='myaccount', account_key='mykey')
uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
# uploader.process_chunk(...)
根据blobservice.py
for Azure Storage and BlobReader
for Google Cloud Storage的源代码,你可以尝试使用Azure函数blobservice.put_block_blob_from_file
从GCS写入流class blobreader
有函数read
作为流,请看下面。
参考https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader中的代码,您可以尝试如下操作。
from google.appengine.ext import blobstore
from azure.storage.blob import BlobService
blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)
blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)
我需要将文件从 google 云存储传输到 Azure Blob 存储。
Google 给出了将文件下载到字节变量的代码片段,如下所示:
# Get Payload Data
req = client.objects().get_media(
bucket=bucket_name,
object=object_name,
generation=generation) # optional
# The BytesIO object may be replaced with any io.Base instance.
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, req, chunksize=1024*1024)
done = False
while not done:
status, done = downloader.next_chunk()
if status:
print 'Download %d%%.' % int(status.progress() * 100)
print 'Download Complete!'
print fh.getvalue()
我可以通过像这样更改 fh 对象类型来修改它以存储到文件中:
fh = open(object_name, 'wb')
然后我可以使用 blob_service.put_block_blob_from_path
.
我想避免在进行传输的机器上写入本地文件。
我收集了 Google 的代码片段,一次将数据块加载到 io.BytesIO() 对象中。我想我可能应该使用它来一次写入一个块的 blob 存储。
我尝试将整个内容读入内存,然后使用 put_block_blob_from_bytes
上传,但出现内存错误(文件可能太大 (~600MB)。
有什么建议吗?
查看了 SDK 源代码后,类似这样的方法可行:
from azure.storage.blob import _chunking
from azure.storage.blob import BlobService
# See _BlobChunkUploader
class PartialChunkUploader(_chunking._BlockBlobChunkUploader):
def __init__(self, blob_service, container_name, blob_name, progress_callback = None):
super(PartialChunkUploader, self).__init__(blob_service, container_name, blob_name, -1, -1, None, False, 5, 1.0, progress_callback, None)
def process_chunk(self, chunk_offset, chunk_data):
'''chunk_offset is the integer offset. chunk_data is an array of bytes.'''
return self._upload_chunk_with_retries(chunk_offset, chunk_data)
blob_service = BlobService(account_name='myaccount', account_key='mykey')
uploader = PartialChunkUploader(blob_service, "container", "foo")
# while (...):
# uploader.process_chunk(...)
根据blobservice.py
for Azure Storage and BlobReader
for Google Cloud Storage的源代码,你可以尝试使用Azure函数blobservice.put_block_blob_from_file
从GCS写入流class blobreader
有函数read
作为流,请看下面。
参考https://cloud.google.com/appengine/docs/python/blobstore/#Python_Using_BlobReader中的代码,您可以尝试如下操作。
from google.appengine.ext import blobstore
from azure.storage.blob import BlobService
blob_key = ...
blob_reader = blobstore.BlobReader(blob_key)
blob_service = BlobService(account_name, account_key)
container_name = ...
blob_name = ...
blobservice.put_block_blob_from_file(container_name, blob_name, blob_reader)