如何合并具有交集的集合(连通分量算法)?
How to merge sets which have intersections (connected components algorithm)?
有什么有效的方法可以合并有交集的集合吗?例如:
l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]
预期结果是:
r = [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]
应合并所有具有交集(公共组件)的集合。例如:
{1, 3} & {2, 3}
# {3}
所以这两个集合应该合并:
{1, 3} | {2, 3}
# {1, 2, 3}
很遗憾,我没有任何可行的解决方案。
更新:结果中集合的顺序并不重要。
@mkrieger1 在评论中提到的实现 connected components algorithm 的一种有效方法是将集合列表转换为一组可散列的冻结集,以便在遍历它时找到一个相交的冻结集使用当前版本,您可以轻松地将其从池中删除:
pool = set(map(frozenset, l))
groups = []
while pool:
groups.append(set(pool.pop()))
while True:
for candidate in pool:
if groups[-1] & candidate:
groups[-1] |= candidate
pool.remove(candidate)
break
else:
break
给定 l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]
,groups
将变为:
[{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]
并且给定 l = [{1, 2}, {3, 4}, {2, 3}]
,groups
将变为:
[{1, 2, 3, 4}]
并且给定 l = [{1}, {2}, {1, 2}]
,groups
将变为:
[{1, 2}]
我提出这个解决方案:
def merge_sets(set_list):
if len(set_list) == 0:
# short circuit to avoid errors
return []
current_set = set_list[0]
new_set_list = [current_set, ]
for s in set_list[1:]: # iterate from the second element
if len(current_set.intersection(s)) > 0:
current_set.update(s)
else:
current_set = set(s) # copy
new_set_list.append(current_set)
return new_set_list
适用于的测试用例:
test_cases = [
{
'input': [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}],
'output': [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}],
},
{
'input': [{1, 2}, {2, 3}, {3, 4}],
'output': [{1, 2, 3, 4}, ],
},
{
'input': [{1}, {2}, {1, 2}],
'output': [{1}, {1, 2}],
},
{
'input': [{1, 2}, {3, 4}, {2, 3}],
'output': [{1, 2}, {2, 3, 4}],
},
]
for case in test_cases:
print('input ', case['input'])
print('expected', case['output'])
new_output = merge_sets(case['input'])
print('real ', new_output)
assert new_output == case['output']
这对你有用吗?
有什么有效的方法可以合并有交集的集合吗?例如:
l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]
预期结果是:
r = [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]
应合并所有具有交集(公共组件)的集合。例如:
{1, 3} & {2, 3}
# {3}
所以这两个集合应该合并:
{1, 3} | {2, 3}
# {1, 2, 3}
很遗憾,我没有任何可行的解决方案。
更新:结果中集合的顺序并不重要。
@mkrieger1 在评论中提到的实现 connected components algorithm 的一种有效方法是将集合列表转换为一组可散列的冻结集,以便在遍历它时找到一个相交的冻结集使用当前版本,您可以轻松地将其从池中删除:
pool = set(map(frozenset, l))
groups = []
while pool:
groups.append(set(pool.pop()))
while True:
for candidate in pool:
if groups[-1] & candidate:
groups[-1] |= candidate
pool.remove(candidate)
break
else:
break
给定 l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]
,groups
将变为:
[{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]
并且给定 l = [{1, 2}, {3, 4}, {2, 3}]
,groups
将变为:
[{1, 2, 3, 4}]
并且给定 l = [{1}, {2}, {1, 2}]
,groups
将变为:
[{1, 2}]
我提出这个解决方案:
def merge_sets(set_list):
if len(set_list) == 0:
# short circuit to avoid errors
return []
current_set = set_list[0]
new_set_list = [current_set, ]
for s in set_list[1:]: # iterate from the second element
if len(current_set.intersection(s)) > 0:
current_set.update(s)
else:
current_set = set(s) # copy
new_set_list.append(current_set)
return new_set_list
适用于的测试用例:
test_cases = [
{
'input': [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}],
'output': [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}],
},
{
'input': [{1, 2}, {2, 3}, {3, 4}],
'output': [{1, 2, 3, 4}, ],
},
{
'input': [{1}, {2}, {1, 2}],
'output': [{1}, {1, 2}],
},
{
'input': [{1, 2}, {3, 4}, {2, 3}],
'output': [{1, 2}, {2, 3, 4}],
},
]
for case in test_cases:
print('input ', case['input'])
print('expected', case['output'])
new_output = merge_sets(case['input'])
print('real ', new_output)
assert new_output == case['output']
这对你有用吗?