boto - set_content_from_file 当 get_bucket 验证为假时慢

boto - slow set_content_from_file when get_bucket validate is False

我正在尝试使用 GreenPool.

将 ~3k 文件(每个 1 KB)上传到 boto

我的问题:

为什么 get_bucket() 调用每次调用需要这么长时间,是什么导致与 set_content() 时间的权衡?我怎样才能绕过它。谢谢!

更多详情:

代码:

def upload(bucket_str, key_str, file_path):

    # new s3 connection
    s3 = boto.connect_s3()

    # get bucket
    bucket_time = time.time()
    b = s3.get_bucket (bucket_name, validate=True)
    logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))

    # get key
    key_time = time.time()
    key = mapping_bucket.new_key(key_str)
    logging.info('new_key Took %f seconds'%(time.time()-key_time))

    for i in range(S3_TRIES):
        try:
            up_time = time.time()
            key.set_contents_from_filename (file_path,
            headers={
                "Content-Encoding": "gzip",
                "Content-Type": "application/json",
            },
            policy='public-read')
            logging.info('set_content Took %f seconds'%(time.time()-up_time))
            key.set_acl('public-read')
            return True

        except Exception as e:
            logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
            _e = e

    raise _e

您可以查看 get_bucket

的文档

If validate=False is passed, no request is made to the service (no charge/communication delay). This is only safe to do if you are sure the bucket exists.

If the default validate=True is passed, a request is made to the service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched a list of keys (but with a max limit set to 0, always returning an empty list) in the bucket (& included better error messages), at an increased expense. As of Boto v2.25.0, this now performs a HEAD request (less expensive but worse error messages).

之后调用set_contents_from_filename需要打开s3键进行读取所以此时会向s3发起请求

回到你关于上传大量文件的问题,既然你用 boto3 标记了你的问题,我建议你转到 boto3 并查看 Transfer Manager