遍历 python 中的嵌套 list/dictionary

Iterating through nested list/dictionary in python

我正在尝试解析一个 yaml 文件 - https://github.com/open-telemetry/opentelemetry-specification/blob/master/semantic_conventions/resource/cloud.yaml

我正在使用以下代码

with open('cloud.yaml') as f:
    my_dict = yaml.safe_load(f)

print(my_dict)

生成以下字典

{'groups': [{'id': 'cloud', 'prefix': 'cloud', 'brief': 'A cloud infrastructure (e.g. GCP, Azure, AWS)\n', 'attributes': [{'id': 'provider', 'type': {'allow_custom_values': True, 'members': [{'id': 'AWS', 'value': 'aws', 'brief': 'Amazon Web Services'}, {'id': 'Azure', 'value': 'azure', 'brief': 'Microsoft Azure'}, {'id': 'GCP', 'value': 'gcp', 'brief': 'Google Cloud Platform'}]}, 'brief': 'Name of the cloud provider.\n', 'examples': 'gcp'}, {'id': 'account.id', 'type': 'string', 'brief': 'The cloud account ID used to identify different entities.\n', 'examples': ['opentelemetry']}, {'id': 'region', 'type': 'string', 'brief': 'A specific geographical location where different entities can run.\n', 'examples': ['us-central1']}, {'id': 'zone', 'type': 'string', 'brief': 'Zones are a sub set of the region connected through low-latency links.\n', 'note': 'In AWS, this is called availability-zone.\n', 'examples': ['us-central1-a']}]}]}

我想遍历元素并提取以下值

  1. id - 云
  2. 所有属性 -> id - 提供者; id - account.id ; id-区域; id - 区域
  3. 成员 - aws、azure、gcp

我正在尝试使用以下代码遍历所有键值

for groups in my_dict.values():
    print(groups)

输出是

[{'id': 'cloud', 'prefix': 'cloud', 'brief': 'A cloud infrastructure (e.g. GCP, Azure, AWS)\n', 'attributes': [{'id': 'provider', 'type': {'allow_custom_values': True, 'members': [{'id': 'AWS', 'value': 'aws', 'brief': 'Amazon Web Services'}, {'id': 'Azure', 'value': 'azure', 'brief': 'Microsoft Azure'}, {'id': 'GCP', 'value': 'gcp', 'brief': 'Google Cloud Platform'}]}, 'brief': 'Name of the cloud provider.\n', 'examples': 'gcp'}, {'id': 'account.id', 'type': 'string', 'brief': 'The cloud account ID used to identify different entities.\n', 'examples': ['opentelemetry']}, {'id': 'region', 'type': 'string', 'brief': 'A specific geographical location where different entities can run.\n', 'examples': ['us-central1']}, {'id': 'zone', 'type': 'string', 'brief': 'Zones are a sub set of the region connected through low-latency links.\n', 'note': 'In AWS, this is called availability-zone.\n', 'examples': ['us-central1-a']}]}]

我想单独打印所有值,例如 - 云、云基础设施(例如 GCP、Azure、AWS)\n 等

我需要的输出是打印以下值:

cloud, A cloud infrastructure (e.g. GCP, Azure, AWS).
cloud.provider,, Name of the cloud provider.
cloud.provider.member, AWS, Amazon Web Services
cloud.provider.member, azure, Microsoft Azure
cloud.provider.member, GCP, Google Cloud Platform
cloud.account.id, string, The cloud account ID used to identify different entities.
cloud.region, string, A specific geographical location where different entities can run.    
.
.
.
.

这是您的输出字典。我让它变得可读

myDict = {
'groups': [
    {
        'id': 'cloud', 
        'prefix': 'cloud', 
        'brief': 'A cloud infrastructure (e.g. GCP, Azure, AWS)\n', 
        'attributes': [
            {
                'id': 'provider', 
                'type': {
                    'allow_custom_values': True, 
                    'members': [
                        {
                            'id': 'AWS', 
                            'value': 'aws', 
                            'brief': 'Amazon Web Services'
                            
                        }, 
                        {
                            'id': 'Azure', 
                            'value': 'azure', 
                            'brief': 'Microsoft Azure'
                            
                        }, 
                        {
                            'id': 'GCP', 
                            'value': 'gcp', 
                            'brief': 'Google Cloud Platform'
                            
                        }
                    ]
                    
                }, 
                'brief': 'Name of the cloud provider.\n',
                'examples': 'gcp'
                
            }, 
            {
                'id': 'account.id', 
                'type': 'string', 
                'brief': 'The cloud account ID used to identify different entities.\n', 
                'examples': ['opentelemetry']}, 
                {
                    'id': 'region', 
                    'type': 'string', 
                    'brief': 'A specific geographical location where different entities can run.\n',
                    'examples': ['us-central1']
                    
                },
                {
                    'id': 'zone', 
                    'type': 'string', 
                    'brief': 'Zones are a sub set of the region connected through low-latency links.\n',
                    'note': 'In AWS, this is called availability-zone.\n',
                    'examples': ['us-central1-a']
                    
                }
        ]
    }
]

}

现在可以看清楚了

for v in myDict['groups'][0].items():
    print(v)

输出:

('id', 'cloud')
('prefix', 'cloud')
('brief', 'A cloud infrastructure (e.g. GCP, Azure, AWS)\n')
('attributes', [{'id': 'provider', 'type': {'allow_custom_values': True, 'members': [{'id': 'AWS', 'value': 'aws', 'brief': 'Amazon Web Services'}, {'id': 'Azure', 'value': 'azure', 'brief': 'Microsoft Azure'}, {'id': 'GCP', 'value': 'gcp', 'brief': 'Google Cloud Platform'}]}, 'brief': 'Name of the cloud provider.\n', 'examples': 'gcp'}, {'id': 'account.id', 'type': 'string', 'brief': 'The cloud account ID used to identify different entities.\n', 'examples': ['opentelemetry']}, {'id': 'region', 'type': 'string', 'brief': 'A specific geographical location where different entities can run.\n', 'examples': ['us-central1']}, {'id': 'zone', 'type': 'string', 'brief': 'Zones are a sub set of the region connected through low-latency links.\n', 'note': 'In AWS, this is called availability-zone.\n', 'examples': ['us-central1-a']}])

现在像这样提取数据。但是您可以在一个 for 循环中获取所有值

data = myDict['groups'][0]
id = data['id']
brief = data['brief']
attr = data['attributes']
mems = attr[0]['type']['members']

print(f"{id},{brief})    

for member in mems:
    print(f"cloud.provider.member.{member['value']}, {member['brief']}")

输出:

cloud,A cloud infrastructure (e.g. GCP, Azure, AWS)

cloud.provider.member.aws, Amazon Web Services
cloud.provider.member.azure, Microsoft Azure
cloud.provider.member.gcp, Google Cloud Platform

也可以用通用的方式实现,验证'type'中的值是否是dict实例:

假设变量parsed_dict解析jaml文件后的结果为:

def remove_end_of_line_char(line_text):
    if len(line_text) > 0 and line_text[-1] == '\n':
        line_text = line_text[:-1]

    return line_text


data_groups = parsed_dict["groups"]
for group in data_groups:
    msg = remove_end_of_line_char(f"{group['id']}, {group['brief']}")
    print(msg)
    attributes_list = group["attributes"]
    for attribute in attributes_list:
        attr_type = attribute['type']
        if isinstance(attr_type, dict):
            print(f"{group['id']}.{attribute['id']},, {remove_end_of_line_char(attribute['brief'])}")
            cloud_provider_member_prefix = f"{group['id']}.{attribute['id']}.member, "
            for member in attr_type['members']:
                print(f"{cloud_provider_member_prefix}{member['id']}, {member['brief']}")
        else:
            msg = remove_end_of_line_char(f"{group['id']}.{attribute['id']}, {attribute['type']}, {attribute['brief']}")
            print(msg)