计算元组列表中的出现次数
Count occurrences within a list of list of tuples
我觉得最好从输入输出入手:
list_of_items = [
{"A": "abc", "B": "dre", "C": "ccp"},
{"A": "qwe", "B": "dre", "C": "ccp"},
{"A": "abc", "B": "dre", "C": "ccp"},
]
result = {'A-abc-->B': {'dre': 2},
'A-abc-->C': {'ccp': 2},
'A-qwe-->B': {'dre': 1},
'A-qwe-->C': {'ccp': 1},
'B-dre-->A': {'abc': 2, 'qwe': 1},
'B-dre-->C': {'ccp': 3},
'C-ccp-->A': {'abc': 2, 'qwe': 1},
'C-ccp-->B': {'dre': 3}}
我的初始输入是作为流出现的项目。这些项目基本上是具有键和值的字典。
我的目标是获取每个特定键并为其附带的所有其他键设置最大值。
因此,如果在 100 项中,对于值为“1”的键 "A",我得到了 90 项键 "B" 值“2”和 10 项键 "B" 值“1111” 我想查看一个显示这些数字的列表。 B2=90, B1111=10.
我的代码正在运行。
但是,我的现实生活场景包含大约 20 个键的 100000 多个不同值。
另外,我的最终目标是 运行 这是 Flink 上的一项工作。
所以我正在寻求有关 Counter / python 流 api 的帮助。
all_tuple_list_items = []
for dict_item in list_of_items:
list_of_tuples = [(k, v) for (k, v) in dict_item.items()]
all_tuple_list_items.append(list_of_tuples)
result_dict = {}
for list_of_tuples in all_tuple_list_items:
for id_tuple in list_of_tuples:
all_other_tuples = list_of_tuples.copy()
all_other_tuples.remove(id_tuple)
dict_of_specific_corresponding = {}
for corresponding_other_tu in all_other_tuples:
ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
corresponding_id = str(corresponding_other_tu[1])
if result_dict.get(ids_connection_id) is None:
result_dict[ids_connection_id] = {corresponding_id: 1}
else:
if result_dict[ids_connection_id].get(corresponding_id) is None:
result_dict[ids_connection_id][corresponding_id] = 1
else:
result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
corresponding_id] + 1
pprint(result_dict)
开始使用了。
但是,还是想得到一个更高效的方法。
使用计数器和流。
这可能吗?
代码
all_tuple_list_items = []
for dict_item in list_of_items:
list_of_tuples = [(k, v) for (k, v) in dict_item[0].items()]
all_tuple_list_items.append(list_of_tuples)
result_dict = {}
for list_of_tuples in all_tuple_list_items:
for id_tuple in list_of_tuples:
all_other_tuples = list_of_tuples.copy()
all_other_tuples.remove(id_tuple)
dict_of_specific_corresponding = {}
for corresponding_other_tu in all_other_tuples:
ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
corresponding_id = str(corresponding_other_tu[1])
if result_dict.get(ids_connection_id) is None:
result_dict[ids_connection_id] = {corresponding_id: 1}
else:
if result_dict[ids_connection_id].get(corresponding_id) is None:
result_dict[ids_connection_id][corresponding_id] = 1
else:
result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
corresponding_id] + 1
pprint(result_dict)
您可以使用函数 permutations()
生成字典中项目的所有排列,并使用 Counter
对它们进行计数。最后,您可以使用 defaultdict()
对 Counter
:
中的项目进行分组
from collections import Counter, defaultdict
from itertools import permutations
from pprint import pprint
list_of_items = [
[{"A": "abc", "B": "dre", "C": "ccp"}],
[{"A": "qwe", "B": "dre", "C": "ccp"}],
[{"A": "abc", "B": "dre", "C": "ccp"}],
]
c = Counter(p for i in list_of_items
for p in permutations(i[0].items(), 2))
d = defaultdict(dict)
for ((i, j), (k, l)), num in c.items():
d[f'{i}-{j}-->{k}'][l] = num
pprint(d)
输出:
defaultdict(<class 'dict'>,
{'A-abc-->B': {'dre': 2},
'A-abc-->C': {'ccp': 2},
'A-qwe-->B': {'dre': 1},
'A-qwe-->C': {'ccp': 1},
'B-dre-->A': {'abc': 2, 'qwe': 1},
'B-dre-->C': {'ccp': 3},
'C-ccp-->A': {'abc': 2, 'qwe': 1},
'C-ccp-->B': {'dre': 3}})
我觉得最好从输入输出入手:
list_of_items = [
{"A": "abc", "B": "dre", "C": "ccp"},
{"A": "qwe", "B": "dre", "C": "ccp"},
{"A": "abc", "B": "dre", "C": "ccp"},
]
result = {'A-abc-->B': {'dre': 2},
'A-abc-->C': {'ccp': 2},
'A-qwe-->B': {'dre': 1},
'A-qwe-->C': {'ccp': 1},
'B-dre-->A': {'abc': 2, 'qwe': 1},
'B-dre-->C': {'ccp': 3},
'C-ccp-->A': {'abc': 2, 'qwe': 1},
'C-ccp-->B': {'dre': 3}}
我的初始输入是作为流出现的项目。这些项目基本上是具有键和值的字典。 我的目标是获取每个特定键并为其附带的所有其他键设置最大值。
因此,如果在 100 项中,对于值为“1”的键 "A",我得到了 90 项键 "B" 值“2”和 10 项键 "B" 值“1111” 我想查看一个显示这些数字的列表。 B2=90, B1111=10.
我的代码正在运行。 但是,我的现实生活场景包含大约 20 个键的 100000 多个不同值。 另外,我的最终目标是 运行 这是 Flink 上的一项工作。
所以我正在寻求有关 Counter / python 流 api 的帮助。
all_tuple_list_items = []
for dict_item in list_of_items:
list_of_tuples = [(k, v) for (k, v) in dict_item.items()]
all_tuple_list_items.append(list_of_tuples)
result_dict = {}
for list_of_tuples in all_tuple_list_items:
for id_tuple in list_of_tuples:
all_other_tuples = list_of_tuples.copy()
all_other_tuples.remove(id_tuple)
dict_of_specific_corresponding = {}
for corresponding_other_tu in all_other_tuples:
ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
corresponding_id = str(corresponding_other_tu[1])
if result_dict.get(ids_connection_id) is None:
result_dict[ids_connection_id] = {corresponding_id: 1}
else:
if result_dict[ids_connection_id].get(corresponding_id) is None:
result_dict[ids_connection_id][corresponding_id] = 1
else:
result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
corresponding_id] + 1
pprint(result_dict)
开始使用了。 但是,还是想得到一个更高效的方法。 使用计数器和流。 这可能吗?
代码
all_tuple_list_items = []
for dict_item in list_of_items:
list_of_tuples = [(k, v) for (k, v) in dict_item[0].items()]
all_tuple_list_items.append(list_of_tuples)
result_dict = {}
for list_of_tuples in all_tuple_list_items:
for id_tuple in list_of_tuples:
all_other_tuples = list_of_tuples.copy()
all_other_tuples.remove(id_tuple)
dict_of_specific_corresponding = {}
for corresponding_other_tu in all_other_tuples:
ids_connection_id = id_tuple[0] + "-" + str(id_tuple[1]) + "-->" + corresponding_other_tu[0]
corresponding_id = str(corresponding_other_tu[1])
if result_dict.get(ids_connection_id) is None:
result_dict[ids_connection_id] = {corresponding_id: 1}
else:
if result_dict[ids_connection_id].get(corresponding_id) is None:
result_dict[ids_connection_id][corresponding_id] = 1
else:
result_dict[ids_connection_id][corresponding_id] = result_dict[ids_connection_id][
corresponding_id] + 1
pprint(result_dict)
您可以使用函数 permutations()
生成字典中项目的所有排列,并使用 Counter
对它们进行计数。最后,您可以使用 defaultdict()
对 Counter
:
from collections import Counter, defaultdict
from itertools import permutations
from pprint import pprint
list_of_items = [
[{"A": "abc", "B": "dre", "C": "ccp"}],
[{"A": "qwe", "B": "dre", "C": "ccp"}],
[{"A": "abc", "B": "dre", "C": "ccp"}],
]
c = Counter(p for i in list_of_items
for p in permutations(i[0].items(), 2))
d = defaultdict(dict)
for ((i, j), (k, l)), num in c.items():
d[f'{i}-{j}-->{k}'][l] = num
pprint(d)
输出:
defaultdict(<class 'dict'>,
{'A-abc-->B': {'dre': 2},
'A-abc-->C': {'ccp': 2},
'A-qwe-->B': {'dre': 1},
'A-qwe-->C': {'ccp': 1},
'B-dre-->A': {'abc': 2, 'qwe': 1},
'B-dre-->C': {'ccp': 3},
'C-ccp-->A': {'abc': 2, 'qwe': 1},
'C-ccp-->B': {'dre': 3}})