boto - set_content_from_file 当 get_bucket 验证为假时慢
boto - slow set_content_from_file when get_bucket validate is False
我正在尝试使用 GreenPool.
将 ~3k 文件(每个 1 KB)上传到 boto
我的问题:
为什么 get_bucket()
调用每次调用需要这么长时间,是什么导致与 set_content()
时间的权衡?我怎样才能绕过它。谢谢!
更多详情:
get_bucket(validate=True)
平均用时30秒,后面的set_content_from_file_name
不到1秒
我尝试更改为 validate=False
,这成功地将 get_bucket()
时间减少到 1 秒以下,但随后 set_content_from_file_name
的时间跃升至约 30 秒.我在 boto docs.
中找不到这种权衡的原因
代码:
def upload(bucket_str, key_str, file_path):
# new s3 connection
s3 = boto.connect_s3()
# get bucket
bucket_time = time.time()
b = s3.get_bucket (bucket_name, validate=True)
logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))
# get key
key_time = time.time()
key = mapping_bucket.new_key(key_str)
logging.info('new_key Took %f seconds'%(time.time()-key_time))
for i in range(S3_TRIES):
try:
up_time = time.time()
key.set_contents_from_filename (file_path,
headers={
"Content-Encoding": "gzip",
"Content-Type": "application/json",
},
policy='public-read')
logging.info('set_content Took %f seconds'%(time.time()-up_time))
key.set_acl('public-read')
return True
except Exception as e:
logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
_e = e
raise _e
您可以查看 get_bucket
的文档
If validate=False
is passed, no request is made to the service (no
charge/communication delay). This is only safe to do if you are sure
the bucket exists.
If the default validate=True
is passed, a request is made to the
service to ensure the bucket exists. Prior to Boto v2.25.0, this
fetched a list of keys (but with a max limit set to 0, always
returning an empty list) in the bucket (& included better error
messages), at an increased expense. As of Boto v2.25.0, this now
performs a HEAD request (less expensive but worse error messages).
之后调用set_contents_from_filename
需要打开s3键进行读取所以此时会向s3发起请求
回到你关于上传大量文件的问题,既然你用 boto3 标记了你的问题,我建议你转到 boto3 并查看 Transfer Manager
我正在尝试使用 GreenPool.
将 ~3k 文件(每个 1 KB)上传到 boto我的问题:
为什么 get_bucket()
调用每次调用需要这么长时间,是什么导致与 set_content()
时间的权衡?我怎样才能绕过它。谢谢!
更多详情:
get_bucket(validate=True)
平均用时30秒,后面的set_content_from_file_name
不到1秒我尝试更改为
validate=False
,这成功地将get_bucket()
时间减少到 1 秒以下,但随后set_content_from_file_name
的时间跃升至约 30 秒.我在 boto docs. 中找不到这种权衡的原因
代码:
def upload(bucket_str, key_str, file_path):
# new s3 connection
s3 = boto.connect_s3()
# get bucket
bucket_time = time.time()
b = s3.get_bucket (bucket_name, validate=True)
logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))
# get key
key_time = time.time()
key = mapping_bucket.new_key(key_str)
logging.info('new_key Took %f seconds'%(time.time()-key_time))
for i in range(S3_TRIES):
try:
up_time = time.time()
key.set_contents_from_filename (file_path,
headers={
"Content-Encoding": "gzip",
"Content-Type": "application/json",
},
policy='public-read')
logging.info('set_content Took %f seconds'%(time.time()-up_time))
key.set_acl('public-read')
return True
except Exception as e:
logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
_e = e
raise _e
您可以查看 get_bucket
If
validate=False
is passed, no request is made to the service (no charge/communication delay). This is only safe to do if you are sure the bucket exists.If the default
validate=True
is passed, a request is made to the service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched a list of keys (but with a max limit set to 0, always returning an empty list) in the bucket (& included better error messages), at an increased expense. As of Boto v2.25.0, this now performs a HEAD request (less expensive but worse error messages).
之后调用set_contents_from_filename
需要打开s3键进行读取所以此时会向s3发起请求
回到你关于上传大量文件的问题,既然你用 boto3 标记了你的问题,我建议你转到 boto3 并查看 Transfer Manager