如何最好地按此对象中的值提取子词典?

How to best extract sub dictionaries by value in this object?

我正在处理一个数据库,其中有人创建了一个 PHP ArrayObject,在创建之前几乎没有任何检查。

我可以使用 python phpserialize 库的反序列化模块,所以它看起来像这样:

{0: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 1: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 2: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}',
 3: '{"0": {"0": "location", "1": "4"}, "1": {"0": "sizing", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "4"}}',
 4: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": 0}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 5: '{"0": {"0": "location", "1": "3"}, "1": {"0": "sizing", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "3"}}',

...

56: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "1"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 57: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 2}, "2": {"0": "design", "1": 1}, "3": {"0": "maintenance", "1": 2}, "4": {"0": "environmental_stewardship", "1": 2}}',
 58: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "4"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 59: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 60: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 61: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 62: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "1"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 63: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 64: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 65: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}'}

问题是我需要一种方法来提取具有相同值的子词典(例如,所有具有 "visual_impact" 或 "color" 等的子词典)。但是,由于这些子词典在整个对象中都没有与相同的键配对,所以这似乎是不可能的。

我认为重新分配键名以与值对齐是可行的。

所以,例如

dict = {"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}

会变成

dict = {"0": {"0": "color", "1": 3}, "4": {"0": "plant_variety", "1": 3}, "1": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "2": {"0": "environmental_stewardship", "1": 4}}

因此,对于 dict["0"],我希望在子 dictionary/value 中始终具有 "color",dict["1"] 将始终具有 "design",等等。因此,对于我上面的示例字典,dict["0"] 会给出 {"0": "color", "1": 3}dict["1"] 会给出 {"0": "design", "1": 4},等等

因此,我试图根据 value/sub 字典中的内容重新分配键。键“0”在子 dictionary/value 中始终具有 "color",键“1”始终具有 "design",对于上面列出的整个字典,依此类推。

我找到了这个 change-the-name-of-a-key-in-dictionary,但是这个对象在如何做到这一点方面令人困惑,因为这取决于 value/sub 词典的内容。

我知道我必须确保在执行此操作之前将值(例如 'use_of_color' 更改为 'color' 等)统一命名,但这不应该是问题。我只需要一种方法来确保我总是通过相同的键提取值为 'color' 的子字典,而我能看到的唯一方法是重新分配键。

如果有更好的方法来处理这个问题,我愿意接受建议。

我假设您想要根据键“0”的值对子词典进行分组,即 'location'、'environmental_stewardship' 等。但实际上,您不需要根本没有子字典,你有字典文字的字符串。如果你的字典被命名为 horrible_mess,你可以使用这个快速技巧:

>>> from ast import literal_eval
>>> still_messy  = {k:literal_eval(v) for k,v in horrible_mess.items()}

那么,简单地执行以下操作可能是最简单的:

>>> from collections import defaultdict
>>> grouped = defaultdict(list)
>>> for sub in still_messy.values():
...     for d in sub.values():
...         grouped[d['0']].append(d)
... 
>>> grouped['visual_appeal']
[{'1': '4', '0': 'visual_appeal'}, {'1': '3', '0': 'visual_appeal'}]
>>> grouped['environmental_stewardship']
[{'1': '3', '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': 4, '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': 2, '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}]
>>> 

请告诉我这是否是您期望的输出,如果不是,请在您的问题中包含预期的输出?我很抱歉代码很丑陋,但它基本上只是循环遍历所有子词典,只考虑无法转换为 int 的值(这是一个 hack),并将它们用作新词典中的键。

代码:

from ast import literal_eval
data = {0: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 1: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 2: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}',
 3: '{"0": {"0": "location", "1": "4"}, "1": {"0": "sizing", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "4"}}',
 4: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": 0}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 5: '{"0": {"0": "location", "1": "3"}, "1": {"0": "sizing", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "3"}}',
56: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "1"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 57: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 2}, "2": {"0": "design", "1": 1}, "3": {"0": "maintenance", "1": 2}, "4": {"0": "environmental_stewardship", "1": 2}}',
 58: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "4"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 59: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 60: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 61: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 62: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "1"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 63: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 64: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 65: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}'}


sep_by_type = {}
for key1,val1 in data.iteritems():
    val1 = literal_eval(val1) #because val1 is a string not a dict
    for key2,val2 in val1.iteritems():
        for key3,val3 in val2.iteritems():
            try:
                int(val3)
            except:
                if val3 not in sep_by_type:
                    sep_by_type[val3] = [val2]
                else:
                    sep_by_type[val3].append(val2)

for sep_key in sep_by_type:
    print sep_key,sep_by_type[sep_key]
    print ""

输出

sizing [{'1': '4', '0': 'sizing'}, {'1': '3', '0': 'sizing'}]

plant_variety [{'1': '2', '0': 'plant_variety'}, {'1': '2', '0': 'plant_variety'}, {'1': 3, '0': 'plant_variety'}, {'1': 2, '0': 'plant_variety'}, {'1': '4', '0': 'plant_variety'}, {'1': '4', '0': 'plant_variety'}]

maintenance [{'1': '2', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}, {'1': 4, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': 0, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': 2, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '1', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}]

use_of_color [{'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}]

color [{'1': 3, '0': 'color'}, {'1': 3, '0': 'color'}]

plant_variety_and_health [{'1': '4', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '2', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}]

visual_appeal [{'1': '4', '0': 'visual_appeal'}, {'1': '3', '0': 'visual_appeal'}]

design [{'1': '2', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': 4, '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '1', '0': 'design'}, {'1': 1, '0': 'design'}, {'1': '4', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '2', '0': 'design'}, {'1': '3', '0': 'design'}]

location [{'1': '4', '0': 'location'}, {'1': '3', '0': 'location'}]

environmental_stewardship [{'1': '3', '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': 4, '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': 2, '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}]

visual_impact [{'1': '3', '0': 'visual_impact'}, {'1': '3', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}, {'1': '2', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}, {'1': '3', '0': 'visual_impact'}, {'1': '2', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}]

更新 我使用 literal_eval 而不是更安全的 eval。 (感谢juanpa.arrivillaga!)