如何使用 boto3 将文件或数据写入 S3 对象
How to write a file or data to an S3 object using boto3
在 boto 2 中,您可以使用这些方法写入 S3 对象:
- Key.set_contents_from_string()
- Key.set_contents_from_file()
- Key.set_contents_from_filename()
- Key.set_contents_from_stream()
是否有等效的 boto 3?将数据保存到存储在 S3 上的对象的 boto3 方法是什么?
在 boto 3 中,'Key.set_contents_from_' 方法被
取代
例如:
import boto3
some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'
# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)
# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')
或者,二进制数据可以来自读取文件,如 the official docs comparing boto 2 and boto 3:
中所述
Storing Data
Storing data from a file, stream, or string is easy:
# Boto 2.x
from boto.s3.key import Key
key = Key('hello.txt')
key.set_contents_from_file('/tmp/hello.txt')
# Boto 3
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
boto3也有直接上传文件的方法:
s3 = boto3.resource('s3')
s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.upload_file
这里有一个从 s3 读取 JSON 的好技巧:
import json, boto3
s3 = boto3.resource("s3").Bucket("bucket")
json.load_s3 = lambda f: json.load(s3.Object(key=f).get()["Body"])
json.dump_s3 = lambda obj, f: s3.Object(key=f).put(Body=json.dumps(obj))
现在您可以使用 json.load_s3
和 json.dump_s3
与 load
和 dump
相同的 API
data = {"test":0}
json.dump_s3(data, "key") # saves json to s3://bucket/key
data = json.load_s3("key") # read json from s3://bucket/key
在写入 S3 中的文件之前,您不再需要将内容转换为二进制文件。以下示例在 S3 存储桶中创建一个包含字符串内容的新文本文件(称为 newfile.txt):
import boto3
s3 = boto3.resource(
's3',
region_name='us-east-1',
aws_access_key_id=KEY_ID,
aws_secret_access_key=ACCESS_KEY
)
content="String content to write to a new S3 file"
s3.Object('my-bucket-name', 'newfile.txt').put(Body=content)
我用来将文件动态上传到给定的 S3 存储桶和子文件夹的更简洁的版本-
import boto3
BUCKET_NAME = 'sample_bucket_name'
PREFIX = 'sub-folder/'
s3 = boto3.resource('s3')
# Creating an empty file called "_DONE" and putting it in the S3 bucket
s3.Object(BUCKET_NAME, PREFIX + '_DONE').put(Body="")
注意:您应该始终将您的 AWS 凭证(aws_access_key_id
和 aws_secret_access_key
)放在单独的文件中,例如 - ~/.aws/credentials
值得一提的是 smart-open 使用 boto3
作为后端。
smart-open
是 python 的 open
的替代品,它可以打开来自 s3
以及 ftp
的文件,http
和许多其他协议。
例如
from smart_open import open
import json
with open("s3://your_bucket/your_key.json", 'r') as f:
data = json.load(f)
aws 凭据通过 boto3 credentials 加载,通常是 ~/.aws/
目录中的文件或环境变量。
您可以使用以下代码在 2019 年将图像写入 S3。要连接到 S3,您必须使用命令 pip install awscli
安装 AWS CLI,然后使用输入一些凭据命令 aws configure
:
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id
经过一番研究,我发现了这个。它可以使用一个简单的 csv 编写器来实现。就是直接把字典写成CSV到S3 bucket。
例如:data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}]
假设所有字典中的键都是统一的。
import csv
import boto3
# Sample input dictionary
data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}]
data_dict_keys = data_dict[0].keys()
# creating a file buffer
file_buff = StringIO()
# writing csv data to file buffer
writer = csv.DictWriter(file_buff, fieldnames=data_dict_keys)
writer.writeheader()
for data in data_dict:
writer.writerow(data)
# creating s3 client connection
client = boto3.client('s3')
# placing file to S3, file_buff.getvalue() is the CSV body for the file
client.put_object(Body=file_buff.getvalue(), Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')
在 boto 2 中,您可以使用这些方法写入 S3 对象:
- Key.set_contents_from_string()
- Key.set_contents_from_file()
- Key.set_contents_from_filename()
- Key.set_contents_from_stream()
是否有等效的 boto 3?将数据保存到存储在 S3 上的对象的 boto3 方法是什么?
在 boto 3 中,'Key.set_contents_from_' 方法被
取代例如:
import boto3
some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'
# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)
# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')
或者,二进制数据可以来自读取文件,如 the official docs comparing boto 2 and boto 3:
中所述Storing Data
Storing data from a file, stream, or string is easy:
# Boto 2.x from boto.s3.key import Key key = Key('hello.txt') key.set_contents_from_file('/tmp/hello.txt') # Boto 3 s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
boto3也有直接上传文件的方法:
s3 = boto3.resource('s3')
s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.upload_file
这里有一个从 s3 读取 JSON 的好技巧:
import json, boto3
s3 = boto3.resource("s3").Bucket("bucket")
json.load_s3 = lambda f: json.load(s3.Object(key=f).get()["Body"])
json.dump_s3 = lambda obj, f: s3.Object(key=f).put(Body=json.dumps(obj))
现在您可以使用 json.load_s3
和 json.dump_s3
与 load
和 dump
data = {"test":0}
json.dump_s3(data, "key") # saves json to s3://bucket/key
data = json.load_s3("key") # read json from s3://bucket/key
在写入 S3 中的文件之前,您不再需要将内容转换为二进制文件。以下示例在 S3 存储桶中创建一个包含字符串内容的新文本文件(称为 newfile.txt):
import boto3
s3 = boto3.resource(
's3',
region_name='us-east-1',
aws_access_key_id=KEY_ID,
aws_secret_access_key=ACCESS_KEY
)
content="String content to write to a new S3 file"
s3.Object('my-bucket-name', 'newfile.txt').put(Body=content)
我用来将文件动态上传到给定的 S3 存储桶和子文件夹的更简洁的版本-
import boto3
BUCKET_NAME = 'sample_bucket_name'
PREFIX = 'sub-folder/'
s3 = boto3.resource('s3')
# Creating an empty file called "_DONE" and putting it in the S3 bucket
s3.Object(BUCKET_NAME, PREFIX + '_DONE').put(Body="")
注意:您应该始终将您的 AWS 凭证(aws_access_key_id
和 aws_secret_access_key
)放在单独的文件中,例如 - ~/.aws/credentials
值得一提的是 smart-open 使用 boto3
作为后端。
smart-open
是 python 的 open
的替代品,它可以打开来自 s3
以及 ftp
的文件,http
和许多其他协议。
例如
from smart_open import open
import json
with open("s3://your_bucket/your_key.json", 'r') as f:
data = json.load(f)
aws 凭据通过 boto3 credentials 加载,通常是 ~/.aws/
目录中的文件或环境变量。
您可以使用以下代码在 2019 年将图像写入 S3。要连接到 S3,您必须使用命令 pip install awscli
安装 AWS CLI,然后使用输入一些凭据命令 aws configure
:
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id
经过一番研究,我发现了这个。它可以使用一个简单的 csv 编写器来实现。就是直接把字典写成CSV到S3 bucket。
例如:data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}] 假设所有字典中的键都是统一的。
import csv
import boto3
# Sample input dictionary
data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}]
data_dict_keys = data_dict[0].keys()
# creating a file buffer
file_buff = StringIO()
# writing csv data to file buffer
writer = csv.DictWriter(file_buff, fieldnames=data_dict_keys)
writer.writeheader()
for data in data_dict:
writer.writerow(data)
# creating s3 client connection
client = boto3.client('s3')
# placing file to S3, file_buff.getvalue() is the CSV body for the file
client.put_object(Body=file_buff.getvalue(), Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')