如何在不创建额外文件的情况下在 Azure 存储容器中创建目录?

How to create directories in Azure storage container without creating extra files?

我已经创建了 python 代码来在 Azure 存储容器中创建一系列文件夹和子文件夹(用于数据湖)。该代码有效并且基于 Microsoft Azure 上的文档。不过有一件事是我在文件夹中创建了一个虚拟 'txt' 文件以创建目录(我可以稍后清理)。我想知道是否有一种方法可以在不创建文件的情况下创建文件夹和子文件夹。我知道 Azure 容器存储中的文件夹不是分层的,而是元数据,我要求的可能无法实现?

connection_string = config['azure_storage_connectionstring']
gen2_container_name = config['gen2_container_name']
container_client = ContainerClient.from_connection_string(connection_string, gen2_container_name)
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

# blob_service_client.create_container(gen2_container_name)


def create_folder(folder, sub_folder):
    blob_client = container_client.get_blob_client('{}/{}/start_here.txt'.format(folder, sub_folder)) 

    with open ('test.txt', 'rb') as data:
        blob_client.upload_blob(data)



def create_all_folders():
    config = load_config()
    folder_list = config['folder_list']
    sub_folder_list = config['sub_folder_list']
    for folder in folder_list:
        for sub_folder in sub_folder_list:
            try:
                create_folder(folder, sub_folder)
            except Exception as e:
                print ('Looks like something went wrong here trying to create this folder structure {}/{}. Maybe the structure already exists?'.format(folder, sub_folder))

I've created python code to create a range of folders and subfolders (for data lake) in an Azure storage container. The code works and is based on the documentation on Microsoft Azure. One thing though is that I'm creating a dummy 'txt' file in the folders in order to create the directory (which I can clean up later). I was wondering if there's a way to create the folders and subfolders without creating a file. I understand that the folders in Azure container storage are not hierarchical and are instead metadata and what I'm asking for may not be possible?

不,对于 blob 存储,这是不可能的。无法创建所谓的“文件夹”

但是您可以像这样使用数据湖 SDK 来创建目录:

from azure.storage.filedatalake import DataLakeServiceClient 
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder     = "test1111111111"
myfile       = "FileName.txt"

file_system_client = datalake_service_client.get_file_system_client(myfilesystem)            
directory_client = file_system_client.create_directory(myfolder)    

只是为了添加一些上下文,这在 Blob 存储中不可能的原因是 folders/directories 不是“真实的”。文件夹不作为独立对象存在,它们仅定义为 blob 名称的一部分。

例如,如果您有一个文件夹“mystuff”和一个文件(blob)“somefile.txt”,blob 名称实际上包括文件夹名称和“/”字符,如 mystuff/somefile.txt. blob 直接存在于容器内,而不是文件夹内。这种命名约定可以多次嵌套在 blob 名称中,例如 folder1/folder2/mystuff/anotherfolder/somefile.txt,但该 blob 仍然只直接存在于容器中。

文件夹可能看起来存在于某些工具中(如 Azure Storage Explorer),因为 SDK 允许 blob 名称过滤:如果您对“/”字符执行此操作,您可以模仿文件夹的外观及其名称内容。但是为了使文件夹看起来存在,容器中必须有具有适当名称的 blob。如果您想“强制”一个文件夹存在,您可以在名称中使用正确的文件夹路径创建一个 0 字节的 blob,但 blob 工件仍然需要存在。

例外是Azure Data Lake Storage (ADLS) Gen 2, which is Blob Storage that implements a Hierarchical Namespace。这使它更像一个文件系统,因此将目录的概念视为独立对象。 ADLS 建立在 Blob 存储之上,因此两者之间有很多相似之处。如果您绝对必须有空目录,那么 ADLS 是最佳选择。