如何合并具有交集的集合(连通分量算法)?

How to merge sets which have intersections (connected components algorithm)?

有什么有效的方法可以合并有交集的集合吗?例如:

l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]

预期结果是:

r = [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]

应合并所有具有交集(公共组件)的集合。例如:

{1, 3} & {2, 3}
# {3}

所以这两个集合应该合并:

{1, 3} | {2, 3}
# {1, 2, 3}

很遗憾,我没有任何可行的解决方案。

更新:结果中集合的顺序并不重要。

@mkrieger1 在评论中提到的实现 connected components algorithm 的一种有效方法是将集合列表转换为一组可散列的冻结集,以便在遍历它时找到一个相交的冻结集使用当前版本,您可以轻松地将其从池中删除:

pool = set(map(frozenset, l))
groups = []
while pool:
    groups.append(set(pool.pop()))
    while True:
        for candidate in pool:
            if groups[-1] & candidate:
                groups[-1] |= candidate
                pool.remove(candidate)
                break
        else:
            break

给定 l = [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}]groups 将变为:

[{1, 2, 3}, {4, 5, 6, 7}, {8, 9}]

并且给定 l = [{1, 2}, {3, 4}, {2, 3}]groups 将变为:

[{1, 2, 3, 4}]

并且给定 l = [{1}, {2}, {1, 2}]groups 将变为:

[{1, 2}]

我提出这个解决方案:

def merge_sets(set_list):
    if len(set_list) == 0:
        # short circuit to avoid errors
        return []

    current_set = set_list[0]
    new_set_list = [current_set, ]

    for s in set_list[1:]:          # iterate from the second element
        if len(current_set.intersection(s)) > 0:
            current_set.update(s)
        else:
            current_set = set(s)    # copy
            new_set_list.append(current_set)

    return new_set_list

适用于的测试用例:

test_cases = [
    {
        'input': [{1, 3}, {2, 3}, {4, 5}, {6, 5}, {7, 5}, {8, 9}],
        'output': [{1, 2, 3}, {4, 5, 6, 7}, {8, 9}],
    },
    {
        'input': [{1, 2}, {2, 3}, {3, 4}],
        'output': [{1, 2, 3, 4}, ],
    },
    {
        'input': [{1}, {2}, {1, 2}],
        'output': [{1}, {1, 2}],
    },
    {
        'input': [{1, 2}, {3, 4}, {2, 3}],
        'output': [{1, 2}, {2, 3, 4}],
    },
]

for case in test_cases:
    print('input   ', case['input'])
    print('expected', case['output'])

    new_output = merge_sets(case['input'])
    print('real    ', new_output)
    assert new_output == case['output']

这对你有用吗?