如何查找 S3 存储桶中文件夹的大小?

How to find size of a folder inside an S3 bucket?

我在 python 中使用 boto3 模块与 S3 交互,目前我能够获取 S3 存储桶中每个单独密钥的大小。但我的动机是只找到顶级文件夹的 space 存储(每个文件夹都是一个不同的项目),我们需要为每个项目收取使用的 space 的费用。我能够获取顶级文件夹的名称,但无法获取有关以下实现中文件夹大小的任何详细信息。以下是我获取顶级文件夹名称的实现。

import boto
import boto.s3.connection

AWS_ACCESS_KEY_ID = "access_id"
AWS_SECRET_ACCESS_KEY = "secret_access_key"
Bucketname = 'Bucket-name' 

conn = boto.s3.connect_to_region('ap-south-1',
   aws_access_key_id=AWS_ACCESS_KEY_ID,
   aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
   is_secure=True, # uncomment if you are not using ssl
   calling_format = boto.s3.connection.OrdinaryCallingFormat(),
   )

bucket = conn.get_bucket('bucket')
folders = bucket.list("", "/")

for folder in folders:
    print(folder.name)

此处的文件夹类型为boto。s3.prefix.Prefix并且不显示任何大小的详细信息。有什么方法可以通过名称在 S3 存储桶中搜索 folder/object 然后获取该对象的大小?

def find_size(name, conn):
  for bucket in conn.get_all_buckets():
    if name == bucket.name:
      total_bytes = 0
      for key in bucket:
        total_bytes += key.size
        total_bytes = total_bytes/1024/1024/1024
      print total_bytes 

要在 S3 中找到 top-level "folders" 的大小(S3 没有 真的 有文件夹的概念,但有点显示文件夹UI 中的结构),像这样的东西会起作用:

from boto3 import client
conn = client('s3')

top_level_folders = dict()

for key in conn.list_objects(Bucket='kitsune-buildtest-production')['Contents']:

    folder = key['Key'].split('/')[0]
    print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

    if folder in top_level_folders:
        top_level_folders[folder] += key['Size']
    else:
        top_level_folders[folder] = key['Size']


for folder, size in top_level_folders.items():
    print("Folder: %s, size: %d" % (folder, size))

为了获得 S3 文件夹的大小,objects(可在 boto3.resource('s3').Bucket 中访问)提供允许的方法 filter(Prefix)你检索 ONLY 符合前缀条件的文件,并使其相当优化。

import boto3

def get_size(bucket, path):
    s3 = boto3.resource('s3')
    my_bucket = s3.Bucket(bucket)
    total_size = 0

    for obj in my_bucket.objects.filter(Prefix=path):
        total_size = total_size + obj.size

    return total_size

假设您想要获取文件夹的大小 s3://my-bucket/my/path/,那么您可以像这样调用之前的函数:

get_size("my-bucket", "my/path/")

那么这当然也很容易适用于顶级文件夹

不使用 boto3,仅使用 aws cli,但这种快速 one-liner 可以达到目的。我通常放一个 tail -1 来只获取摘要文件夹的大小。但是,对于具有许多 objects 的文件夹,可能会有点慢。

aws s3 ls --summarize --human-readable --递归 s3://bucket-name/folder-name |尾巴-1

To get more than 1000 objects from S3 by using list_objects_v2, try this

from boto3 import client
conn = client('s3')

top_level_folders = dict()

paginator = conn.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket='bucket', Prefix='prefix')
index = 1
for page in pages:
    for key in page['Contents']:
        print(key['Size'])
        folder = key['Key'].split('/')[index]
        print("Key %s in folder %s. %d bytes" % (key['Key'], folder, key['Size']))

        if folder in top_level_folders:
            top_level_folders[folder] += key['Size']
        else:
            top_level_folders[folder] = key['Size']

for folder, size in top_level_folders.items():
    size_in_gb = size/(1024*1024*1024)
    print("Folder: %s, size: %.2f GB" % (folder, size_in_gb))

if the prefix is notes/ and the delimiter is a slash (/) as in notes/summer/july, the common prefix is notes/summer/. Incase prefix is "notes/" : index = 1 or "notes/summer/" : index = 2