Python boto3 从 s3 加载模型 tar 文件并解压

Python boto3 load model tar file from s3 and unpack it

我正在使用 Sagemaker,并且有一堆 model.tar.gz 文件需要解压并加载到 sklearn 中。我一直在测试使用 list_objects 和定界符来访问 tar.gz 文件:

response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)


for i in response['Contents']:
    print(i['Key'])

然后我打算用

提取
import tarfile
tf = tarfile.open(model.read())
tf.extractall()

但是我如何从 s3 获取实际的 tar.gz 文件而不是某个 boto3 对象?

您可以使用 s3.download_file() 将对象下载到文件。这将使您的代码看起来像:

s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'

# List objects matching your criteria
response = s3.list_objects(
    Bucket = bucket,
    Prefix = prefix,
    Delimiter = '.csv'
)

# Iterate over each file found and download it
for i in response['Contents']:
    key = i['Key']
    dest = os.path.join('/tmp',key)
    print("Downloading file",key,"from bucket",bucket)
    s3.download_file(
        Bucket = bucket,
        Key = key,
        Filename = dest
    )