如何使字典列表中的值唯一?
How to make values in list of dictionary unique?
我在 Python 中有一个字典列表,如下所示:
d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...]
我想要实现的是保持 feature_a
、_b
和 _c
唯一。
例如,如果我们有 3 个条目具有相同的 feature_a
和 _b
,但有 3 个不同的值 feature_c
100
,100
, 150
,那么运算后应该是100
和150
.
我怎样才能做到这一点?
============================================= ===================
更新:
好的,感谢 Anand 的出色回答,它工作得很好。但是,我还有一个问题。
假设我们有一个新的 feature_d
并且字典如下所示:
d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...]
我只想对 feature_a
、_b
和 _c
进行重复数据删除,但将 feature_d
排除在外。我怎样才能做到这一点?
非常感谢。
如果初始d
列表的顺序不重要,你可以把每个字典的.items()
转换成可散列的frozenset()
,然后您可以将整个内容转换为 set()
或 frozenset()
,然后将每个 frozenset()
转换回字典。示例 -
uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
sets()
不允许重复元素。尽管您最终会失去列表的顺序。对于 Python 2.x ,不需要 list(...)
,因为 map()
returns 一个列表。
Example/Demo -
>>> import pprint
>>> pprint.pprint(d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}]
>>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
>>> pprint.pprint(uniq_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}]
对于新要求-
However, what if that I have another feature_d but I only want to dedup feature_a, _b and _c
If two entries which have same feature_a, _b and _c, they are considered the same and duplicated, no matter what is in feature_d
一个简单的方法是使用一个集合和一个新列表,仅将您需要的特征添加到集合中,并仅使用您需要的特征进行检查。示例 -
seen_set = set()
new_d = []
for i in d:
if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
new_d.append(i)
seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
Example/Demo -
>>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'},
... {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}]
>>> seen_set = set()
>>> new_d = []
>>> for i in d:
... if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
... new_d.append(i)
... seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
...
>>> pprint.pprint(new_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}]
我在 Python 中有一个字典列表,如下所示:
d = [{feature_a:1, feature_b:'Jul', feature_c:100}, {feature_a:2, feature_b:'Jul', feature_c:150}, {feature_a:1, feature_b:'Mar', feature_c:110}, ...]
我想要实现的是保持 feature_a
、_b
和 _c
唯一。
例如,如果我们有 3 个条目具有相同的 feature_a
和 _b
,但有 3 个不同的值 feature_c
100
,100
, 150
,那么运算后应该是100
和150
.
我怎样才能做到这一点?
============================================= =================== 更新:
好的,感谢 Anand 的出色回答,它工作得很好。但是,我还有一个问题。
假设我们有一个新的 feature_d
并且字典如下所示:
d = [{feature_a:1, feature_b:'Jul', feature_c:100, feature_d:'A'}, {feature_a:2, feature_b:'Jul', feature_c:150, feature_d: 'B'}, {feature_a:1, feature_b:'Mar', feature_c:110, feature_d:'F'}, ...]
我只想对 feature_a
、_b
和 _c
进行重复数据删除,但将 feature_d
排除在外。我怎样才能做到这一点?
非常感谢。
如果初始d
列表的顺序不重要,你可以把每个字典的.items()
转换成可散列的frozenset()
,然后您可以将整个内容转换为 set()
或 frozenset()
,然后将每个 frozenset()
转换回字典。示例 -
uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
sets()
不允许重复元素。尽管您最终会失去列表的顺序。对于 Python 2.x ,不需要 list(...)
,因为 map()
returns 一个列表。
Example/Demo -
>>> import pprint
>>> pprint.pprint(d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150}]
>>> uniq_d = list(map(dict, frozenset(frozenset(i.items()) for i in d)))
>>> pprint.pprint(uniq_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100},
{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 150},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150}]
对于新要求-
However, what if that I have another feature_d but I only want to dedup feature_a, _b and _c
If two entries which have same feature_a, _b and _c, they are considered the same and duplicated, no matter what is in feature_d
一个简单的方法是使用一个集合和一个新列表,仅将您需要的特征添加到集合中,并仅使用您需要的特征进行检查。示例 -
seen_set = set()
new_d = []
for i in d:
if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
new_d.append(i)
seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
Example/Demo -
>>> d = [{'feature_a':1, 'feature_b':'Jul', 'feature_c':100, 'feature_d':'A'},
... {'feature_a':2, 'feature_b':'Jul', 'feature_c':150, 'feature_d': 'B'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'F'},
... {'feature_a':1, 'feature_b':'Mar', 'feature_c':110, 'feature_d':'G'}]
>>> seen_set = set()
>>> new_d = []
>>> for i in d:
... if tuple([i['feature_a'],i['feature_b'],i['feature_c']]) not in seen_set:
... new_d.append(i)
... seen_set.add(tuple([i['feature_a'],i['feature_b'],i['feature_c']]))
...
>>> pprint.pprint(new_d)
[{'feature_a': 1, 'feature_b': 'Jul', 'feature_c': 100, 'feature_d': 'A'},
{'feature_a': 2, 'feature_b': 'Jul', 'feature_c': 150, 'feature_d': 'B'},
{'feature_a': 1, 'feature_b': 'Mar', 'feature_c': 110, 'feature_d': 'F'}]