从字典列表中删除重复项（具有唯一值）

Question

我有一个字典列表，每个字典都描述了一个文件（文件格式、文件名、文件大小……以及文件的完整路径[总是唯一的]） .目标是 排除所有描述同一文件副本的字典 （我只希望每个文件有一个字典（条目），无论有多少副本。

换句话说：如果 2 个（或更多）dict 仅在一个键上不同（即路径）- 只保留其中一个。

例如，这里是源列表：

src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

结果应该是这样的：

dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

Answer 1

使用另一个词典将列表中没有 "ignored" 键的词典映射到实际词典。这样，每一种只会保留一个。当然，dict 是不可哈希的，因此您必须改用（排序的）元组。

src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())

结果是：

[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'}, 
 {'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]

Answer 2

我自己的解决方案（可能不是最好的，但有效）：

    dst_list = []
    seen_items = set()
    for dictionary in src_list:
        # here we cut the unique key (path) out to add it back later after a duplicate check
        path = dictionary.pop('path', None)
        t = tuple(dictionary.items())
        if t not in seen_items:
            seen_items.add(t)
            # duplicate-check passed, adding the unique key back to it's dictionry
            dictionary['path'] = path
            dst_list.append(dictionary)

    print(dst_list)

在哪里

src_list 是可能重复的原始列表，

dst_list是最终的无重复列表，

path 是唯一键

从字典列表中删除重复项（具有唯一值）

Remove duplicates from the list of dictionaries (with a unique value)

python

dictionary

list

python-2.x

python-3.x