Azure python 存储块 blob 存储正在耗尽所有内存
Azure python storage block blob storage is eating all the memory up
我已经编写了一个 Python 脚本来自动构建 Azure VM 并从 KVM 上传到 Azure,但我遇到了一个我无法解决的问题。
一旦构建了 VM,我就尝试使用 Azure Python 模块将磁盘上传到 Azure,问题是脚本实际上正在吃掉所有可用的 RAM。我尝试了几种编码方式,但总是以相同的结果结束。
block_blob_service = BlockBlobService(vars.az_storage_acc_name, vars.az_sto_key)
blob = open(args.pool_path + args.name + "-az"+'.vhd', 'r')
print "Upload {} to Azure Blob service".format(args.name +"-az"+'.vhd')
block_blob_service.create_blob_from_stream(vars.az_cnt, args.name +"-az"+'.vhd', blob)
我也试过以下方法:
stream = io.open('/path_to_vhd', 'rb')
BlockBlobService.create_blob_from_stream(vars.az_cnt, "test-stream.vhd", stream)
运气不好,每次启动 blob 创建,但如果最终失败,因为没有可用的 RAM。
你有什么线索可以让我实现这个目标吗?
这需要将整个流保存在内存中,除非您的计算机中有最大 RAM 大小,否则此代码将无法运行,并且在某些时候会给您系统内存不足的异常。
我建议您分块上传流,而不是一次写入。
这是一个分块上传流的函数
def _upload_blob_chunks(blob_service, container_name, blob_name,
blob_size, block_size, stream, max_connections,
progress_callback, validate_content, lease_id, uploader_class,
maxsize_condition=None, if_modified_since=None, if_unmodified_since=None, if_match=None,
if_none_match=None, timeout=None,
content_encryption_key=None, initialization_vector=None, resource_properties=None):
encryptor, padder = _get_blob_encryptor_and_padder(content_encryption_key, initialization_vector,
uploader_class is not _PageBlobChunkUploader)
uploader = uploader_class(
blob_service,
container_name,
blob_name,
blob_size,
block_size,
stream,
max_connections > 1,
progress_callback,
validate_content,
lease_id,
timeout,
encryptor,
padder
)
uploader.maxsize_condition = maxsize_condition
# Access conditions do not work with parallelism
if max_connections > 1:
uploader.if_match = uploader.if_none_match = uploader.if_modified_since = uploader.if_unmodified_since = None
else:
uploader.if_match = if_match
uploader.if_none_match = if_none_match
uploader.if_modified_since = if_modified_since
uploader.if_unmodified_since = if_unmodified_since
if progress_callback is not None:
progress_callback(0, blob_size)
if max_connections > 1:
import concurrent.futures
from threading import BoundedSemaphore
'''
Ensures we bound the chunking so we only buffer and submit 'max_connections' amount of work items to the executor.
This is necessary as the executor queue will keep accepting submitted work items, which results in buffering all the blocks if
the max_connections + 1 ensures the next chunk is already buffered and ready for when the worker thread is available.
'''
chunk_throttler = BoundedSemaphore(max_connections + 1)
executor = concurrent.futures.ThreadPoolExecutor(max_connections)
futures = []
running_futures = []
# Check for exceptions and fail fast.
for chunk in uploader.get_chunk_streams():
for f in running_futures:
if f.done():
if f.exception():
raise f.exception()
else:
running_futures.remove(f)
chunk_throttler.acquire()
future = executor.submit(uploader.process_chunk, chunk)
# Calls callback upon completion (even if the callback was added after the Future task is done).
future.add_done_callback(lambda x: chunk_throttler.release())
futures.append(future)
running_futures.append(future)
# result() will wait until completion and also raise any exceptions that may have been set.
range_ids = [f.result() for f in futures]
else:
range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
if resource_properties:
resource_properties.last_modified = uploader.last_modified
resource_properties.etag = uploader.etag
return range_ids
作为参考,您可以浏览下面的线程
另外,同类型的请求有类似的线程
或者,您可以使用 powershell 将 VHD 上传到 vm 存储帐户,如下所示
$rgName = "myResourceGroup"
$urlOfUploadedImageVhd = "https://mystorageaccount.blob.core.windows.net/mycontainer/myUploadedVHD.vhd"
Add-AzVhd -ResourceGroupName $rgName -Destination $urlOfUploadedImageVhd `
-LocalFilePath "C:\Users\Public\Documents\Virtual hard disks\myVHD.vhd"
这是相同的参考
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/upload-generalized-managed
希望对您有所帮助。
感谢您的意见。
想不通的是,到底有什么区别
block_blob_service.create_blob_from_stream
和
block_blob_service.create_blob_from_path
如果它试图将所有内容保存在 RAM 中?
我已经编写了一个 Python 脚本来自动构建 Azure VM 并从 KVM 上传到 Azure,但我遇到了一个我无法解决的问题。 一旦构建了 VM,我就尝试使用 Azure Python 模块将磁盘上传到 Azure,问题是脚本实际上正在吃掉所有可用的 RAM。我尝试了几种编码方式,但总是以相同的结果结束。
block_blob_service = BlockBlobService(vars.az_storage_acc_name, vars.az_sto_key)
blob = open(args.pool_path + args.name + "-az"+'.vhd', 'r')
print "Upload {} to Azure Blob service".format(args.name +"-az"+'.vhd')
block_blob_service.create_blob_from_stream(vars.az_cnt, args.name +"-az"+'.vhd', blob)
我也试过以下方法:
stream = io.open('/path_to_vhd', 'rb')
BlockBlobService.create_blob_from_stream(vars.az_cnt, "test-stream.vhd", stream)
运气不好,每次启动 blob 创建,但如果最终失败,因为没有可用的 RAM。
你有什么线索可以让我实现这个目标吗?
这需要将整个流保存在内存中,除非您的计算机中有最大 RAM 大小,否则此代码将无法运行,并且在某些时候会给您系统内存不足的异常。
我建议您分块上传流,而不是一次写入。
这是一个分块上传流的函数
def _upload_blob_chunks(blob_service, container_name, blob_name,
blob_size, block_size, stream, max_connections,
progress_callback, validate_content, lease_id, uploader_class,
maxsize_condition=None, if_modified_since=None, if_unmodified_since=None, if_match=None,
if_none_match=None, timeout=None,
content_encryption_key=None, initialization_vector=None, resource_properties=None):
encryptor, padder = _get_blob_encryptor_and_padder(content_encryption_key, initialization_vector,
uploader_class is not _PageBlobChunkUploader)
uploader = uploader_class(
blob_service,
container_name,
blob_name,
blob_size,
block_size,
stream,
max_connections > 1,
progress_callback,
validate_content,
lease_id,
timeout,
encryptor,
padder
)
uploader.maxsize_condition = maxsize_condition
# Access conditions do not work with parallelism
if max_connections > 1:
uploader.if_match = uploader.if_none_match = uploader.if_modified_since = uploader.if_unmodified_since = None
else:
uploader.if_match = if_match
uploader.if_none_match = if_none_match
uploader.if_modified_since = if_modified_since
uploader.if_unmodified_since = if_unmodified_since
if progress_callback is not None:
progress_callback(0, blob_size)
if max_connections > 1:
import concurrent.futures
from threading import BoundedSemaphore
'''
Ensures we bound the chunking so we only buffer and submit 'max_connections' amount of work items to the executor.
This is necessary as the executor queue will keep accepting submitted work items, which results in buffering all the blocks if
the max_connections + 1 ensures the next chunk is already buffered and ready for when the worker thread is available.
'''
chunk_throttler = BoundedSemaphore(max_connections + 1)
executor = concurrent.futures.ThreadPoolExecutor(max_connections)
futures = []
running_futures = []
# Check for exceptions and fail fast.
for chunk in uploader.get_chunk_streams():
for f in running_futures:
if f.done():
if f.exception():
raise f.exception()
else:
running_futures.remove(f)
chunk_throttler.acquire()
future = executor.submit(uploader.process_chunk, chunk)
# Calls callback upon completion (even if the callback was added after the Future task is done).
future.add_done_callback(lambda x: chunk_throttler.release())
futures.append(future)
running_futures.append(future)
# result() will wait until completion and also raise any exceptions that may have been set.
range_ids = [f.result() for f in futures]
else:
range_ids = [uploader.process_chunk(result) for result in uploader.get_chunk_streams()]
if resource_properties:
resource_properties.last_modified = uploader.last_modified
resource_properties.etag = uploader.etag
return range_ids
作为参考,您可以浏览下面的线程
另外,同类型的请求有类似的线程
或者,您可以使用 powershell 将 VHD 上传到 vm 存储帐户,如下所示
$rgName = "myResourceGroup"
$urlOfUploadedImageVhd = "https://mystorageaccount.blob.core.windows.net/mycontainer/myUploadedVHD.vhd"
Add-AzVhd -ResourceGroupName $rgName -Destination $urlOfUploadedImageVhd `
-LocalFilePath "C:\Users\Public\Documents\Virtual hard disks\myVHD.vhd"
这是相同的参考
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/upload-generalized-managed
希望对您有所帮助。
感谢您的意见。
想不通的是,到底有什么区别
block_blob_service.create_blob_from_stream
和
block_blob_service.create_blob_from_path
如果它试图将所有内容保存在 RAM 中?