Python boto3 从 s3 加载模型 tar 文件并解压
Python boto3 load model tar file from s3 and unpack it
我正在使用 Sagemaker,并且有一堆 model.tar.gz 文件需要解压并加载到 sklearn 中。我一直在测试使用 list_objects 和定界符来访问 tar.gz 文件:
response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)
for i in response['Contents']:
print(i['Key'])
然后我打算用
提取
import tarfile
tf = tarfile.open(model.read())
tf.extractall()
但是我如何从 s3 获取实际的 tar.gz 文件而不是某个 boto3 对象?
您可以使用 s3.download_file()
将对象下载到文件。这将使您的代码看起来像:
s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'
# List objects matching your criteria
response = s3.list_objects(
Bucket = bucket,
Prefix = prefix,
Delimiter = '.csv'
)
# Iterate over each file found and download it
for i in response['Contents']:
key = i['Key']
dest = os.path.join('/tmp',key)
print("Downloading file",key,"from bucket",bucket)
s3.download_file(
Bucket = bucket,
Key = key,
Filename = dest
)
我正在使用 Sagemaker,并且有一堆 model.tar.gz 文件需要解压并加载到 sklearn 中。我一直在测试使用 list_objects 和定界符来访问 tar.gz 文件:
response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)
for i in response['Contents']:
print(i['Key'])
然后我打算用
提取import tarfile
tf = tarfile.open(model.read())
tf.extractall()
但是我如何从 s3 获取实际的 tar.gz 文件而不是某个 boto3 对象?
您可以使用 s3.download_file()
将对象下载到文件。这将使您的代码看起来像:
s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'
# List objects matching your criteria
response = s3.list_objects(
Bucket = bucket,
Prefix = prefix,
Delimiter = '.csv'
)
# Iterate over each file found and download it
for i in response['Contents']:
key = i['Key']
dest = os.path.join('/tmp',key)
print("Downloading file",key,"from bucket",bucket)
s3.download_file(
Bucket = bucket,
Key = key,
Filename = dest
)