将 csv 文件上传到 GCP 存储,超时错误

Upload a csv file to GCP storage , time out error

我使用以下代码将文件上传到 GCP 存储。我收到超时错误。该文件大约为 1Gb。它是一个大文件。如何解决这个上传超时问题?

文件“/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/requests/_request_helpers.py”,第 136 行,在 http_request return _helpers.wait_and_retry(函数, RequestsMixin._get_status_code, retry_strategy) 文件“/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/_helpers.py”,第 186 行,在 wait_and_retry 中 引发错误 文件“/Users/xxx/opt/anaconda3/lib/python3.8/site-packages/google/resumable_media/_helpers.py”,第 175 行,在 wait_and_retry 中 响应=功能() 请求中的文件“/Users/xxxx/opt/anaconda3/lib/python3.8/site-packages/google/auth/transport/requests.py”,第 482 行 response = super(AuthorizedSession, self).request( 请求中的文件“/Users/xxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/sessions.py”,第 542 行 resp = self.send(准备, **send_kwargs) 文件“/Users/xxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/sessions.py”,第 655 行,在发送 r = adapter.send(请求,**kwargs) 发送文件“/Users/xxxxxx/opt/anaconda3/lib/python3.8/site-packages/requests/adapters.py”,第 498 行 引发连接错误(错误,请求=请求) requests.exceptions.ConnectionError: ('Connection aborted.', 超时('The write operation timed out'))


def upload_file_to_gcs(local_filepath:str, bucket_name:str, gcs_filepath:str = None):
   
    if local_filepath == None: 
        raise ValueError("local_filepath cannot be None")

    if not os.path.isfile(local_filepath) or not os.path.exists(local_filepath):
        raise TypeError(f"{local_filepath} is not a file or does not exist.")

    if bucket_name == None: 
        raise ValueError("bucket cannot be None")

    if not bucket_exist(bucket_name):
        logging.info(f"Bucket {bucket_name} does not exist. Creating...")
        create_bucket(bucket_name)

    logging.info(f"Uploading {local_filepath} to GCS...")

    # Initialise a client
    storage_client = storage.Client()

    if gcs_filepath == None: 
        gcs_filepath = Path(local_filepath).name

    #create bucket object
    bucket = storage_client.get_bucket(bucket_name)

    #upload
    blob = bucket.blob(gcs_filepath)
    uploaded_file =  blob.upload_from_filename(local_filepath)

    logging.info(f"Uploaded {local_filepath} to {bucket_name} in GCS.")
    
    return vars(blob)



您可以在创建存储桶客户端时定义超时。 看看 - https://googleapis.dev/python/storage/latest/retry_timeout.html

如果你的互联网连接不好,你也可以调整上传的块大小(尽管不推荐这样做)

from google.cloud import storage
storage.blob._DEFAULT_CHUNKSIZE = 5 * 1024* 1024  # 5 MB
storage.blob._MAX_MULTIPART_SIZE = 5 * 1024* 1024  # 5 MB