如何查找 S3 存储桶中文件夹的大小?
How to find size of a folder inside an S3 bucket?
我在 python 中使用 boto3 模块与 S3 交互,目前我能够获取 S3 存储桶中每个单独密钥的大小。但我的动机是只找到顶级文件夹的 space 存储(每个文件夹都是一个不同的项目),我们需要为每个项目收取使用的 space 的费用。我能够获取顶级文件夹的名称,但无法获取有关以下实现中文件夹大小的任何详细信息。以下是我获取顶级文件夹名称的实现。
import boto
import boto.s3.connection
AWS_ACCESS_KEY_ID = "access_id"
AWS_SECRET_ACCESS_KEY = "secret_access_key"
Bucketname = 'Bucket-name'
conn = boto.s3.connect_to_region('ap-south-1',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
is_secure=True, # uncomment if you are not using ssl
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.get_bucket('bucket')
folders = bucket.list("", "/")
for folder in folders:
print(folder.name)
此处的文件夹类型为boto。s3.prefix.Prefix并且不显示任何大小的详细信息。有什么方法可以通过名称在 S3 存储桶中搜索 folder/object 然后获取该对象的大小?
def find_size(name, conn):
for bucket in conn.get_all_buckets():
if name == bucket.name:
total_bytes = 0
for key in bucket:
total_bytes += key.size
total_bytes = total_bytes/1024/1024/1024
print total_bytes
要在 S3 中找到 top-level "folders" 的大小(S3 没有 真的 有文件夹的概念,但有点显示文件夹UI 中的结构),像这样的东西会起作用:
from boto3 import client
conn = client('s3')
top_level_folders = dict()
for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:
folder = key['Key'].split('/')[0]
print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))
if folder in top_level_folders:
top_level_folders[folder] += key['Size']
else:
top_level_folders[folder] = key['Size']
for folder, size in top_level_folders.items():
print("Folder: %s, size: %d" % (folder, size))
为了获得 S3 文件夹的大小,objects(可在 boto3.resource('s3').Bucket 中访问)提供允许的方法 filter(Prefix)
你检索 ONLY 符合前缀条件的文件,并使其相当优化。
import boto3
def get_size(bucket, path):
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucket)
total_size = 0
for obj in my_bucket.objects.filter(Prefix=path):
total_size = total_size + obj.size
return total_size
假设您想要获取文件夹的大小 s3://my-bucket/my/path/
,那么您可以像这样调用之前的函数:
get_size("my-bucket", "my/path/")
那么这当然也很容易适用于顶级文件夹
不使用 boto3,仅使用 aws cli,但这种快速 one-liner 可以达到目的。我通常放一个 tail -1 来只获取摘要文件夹的大小。但是,对于具有许多 objects 的文件夹,可能会有点慢。
aws s3 ls --summarize --human-readable --递归 s3://bucket-name/folder-name |尾巴-1
To get more than 1000 objects from S3 by using list_objects_v2, try this
from boto3 import client
conn = client('s3')
top_level_folders = dict()
paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
for key in page['Contents']:
print(key['Size'])
folder = key['Key'].split('/')[index]
print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))
if folder in top_level_folders:
top_level_folders[folder] += key['Size']
else:
top_level_folders[folder] = key['Size']
for folder, size in top_level_folders.items():
size_in_gb = size/(1024*1024*1024)
print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))
if the prefix is notes/ and the delimiter is a slash (/) as in
notes/summer/july, the common prefix is notes/summer/.
Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2
我在 python 中使用 boto3 模块与 S3 交互,目前我能够获取 S3 存储桶中每个单独密钥的大小。但我的动机是只找到顶级文件夹的 space 存储(每个文件夹都是一个不同的项目),我们需要为每个项目收取使用的 space 的费用。我能够获取顶级文件夹的名称,但无法获取有关以下实现中文件夹大小的任何详细信息。以下是我获取顶级文件夹名称的实现。
import boto
import boto.s3.connection
AWS_ACCESS_KEY_ID = "access_id"
AWS_SECRET_ACCESS_KEY = "secret_access_key"
Bucketname = 'Bucket-name'
conn = boto.s3.connect_to_region('ap-south-1',
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
is_secure=True, # uncomment if you are not using ssl
calling_format = boto.s3.connection.OrdinaryCallingFormat(),
)
bucket = conn.get_bucket('bucket')
folders = bucket.list("", "/")
for folder in folders:
print(folder.name)
此处的文件夹类型为boto。s3.prefix.Prefix并且不显示任何大小的详细信息。有什么方法可以通过名称在 S3 存储桶中搜索 folder/object 然后获取该对象的大小?
def find_size(name, conn):
for bucket in conn.get_all_buckets():
if name == bucket.name:
total_bytes = 0
for key in bucket:
total_bytes += key.size
total_bytes = total_bytes/1024/1024/1024
print total_bytes
要在 S3 中找到 top-level "folders" 的大小(S3 没有 真的 有文件夹的概念,但有点显示文件夹UI 中的结构),像这样的东西会起作用:
from boto3 import client
conn = client('s3')
top_level_folders = dict()
for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:
folder = key['Key'].split('/')[0]
print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))
if folder in top_level_folders:
top_level_folders[folder] += key['Size']
else:
top_level_folders[folder] = key['Size']
for folder, size in top_level_folders.items():
print("Folder: %s, size: %d" % (folder, size))
为了获得 S3 文件夹的大小,objects(可在 boto3.resource('s3').Bucket 中访问)提供允许的方法 filter(Prefix)
你检索 ONLY 符合前缀条件的文件,并使其相当优化。
import boto3
def get_size(bucket, path):
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucket)
total_size = 0
for obj in my_bucket.objects.filter(Prefix=path):
total_size = total_size + obj.size
return total_size
假设您想要获取文件夹的大小 s3://my-bucket/my/path/
,那么您可以像这样调用之前的函数:
get_size("my-bucket", "my/path/")
那么这当然也很容易适用于顶级文件夹
不使用 boto3,仅使用 aws cli,但这种快速 one-liner 可以达到目的。我通常放一个 tail -1 来只获取摘要文件夹的大小。但是,对于具有许多 objects 的文件夹,可能会有点慢。
aws s3 ls --summarize --human-readable --递归 s3://bucket-name/folder-name |尾巴-1
To get more than 1000 objects from S3 by using list_objects_v2, try this
from boto3 import client
conn = client('s3')
top_level_folders = dict()
paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
for key in page['Contents']:
print(key['Size'])
folder = key['Key'].split('/')[index]
print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))
if folder in top_level_folders:
top_level_folders[folder] += key['Size']
else:
top_level_folders[folder] = key['Size']
for folder, size in top_level_folders.items():
size_in_gb = size/(1024*1024*1024)
print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))
if the prefix is notes/ and the delimiter is a slash (/) as in notes/summer/july, the common prefix is notes/summer/. Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2