比较大量图的同构性
Comparing a large number of graphs for isomorphism
我正在比较一大组 networkx 图的同构性,其中大多数图不应该是同构的(例如,假设 0-20% 与列表中的某些东西同构)。
我试过下面的方法。
graphs = [] # A list of networkx graphs
unique = [] # A list of unique graphs
for new in graphs:
for old in unique:
if nx.is_isomorphic(new, old[0]):
break
else:
unique.append([new])
这让我得到了一个更快的缩减集,但我仍然觉得它太慢了,不适合理想的使用。是否有一些更快的算法来处理此类问题(比较传递交换属性对)或将此算法扩展到多核设置的方法(运行 在 20 核机器上)。
我已经根据节点/边的数量过滤这些数据集,我们可以假设 nx.is_isomorphic 函数不能通过任何过滤类型的操作变得更快。我现在也不能轻易更改工具,所以使用编译包不是一个选项。
附加信息:
图往往大约有 16-20 个节点,总共有 24-48 条边,有很多互连,因此每个节点大约有 8 条边。每条边也被标记,但只有 2-3 种类型的边被使用过。
您可以在 PyPy 上试用您的代码,它为纯 Python 代码提供即时编译。对于可能的性能提升,他们说...
...depends greatly on the type of task being performed. The geometric average of all benchmarks is 0.13 or 7.5 times faster than CPython
如果您的工作负载是 CPU-bound(看起来是这样)并且 Python 过程很长-运行(因此可以执行 JIT 编译)那么性能提升可以重要的。 NetworkX 是纯 Python(它有像 numpy 这样的可选依赖项,但它们是额外功能所必需的),特别是 isomorph
模块。我尝试了 PyPy 5.7.1 和 networkx/algorithms/isomorphism/tests/test_isomorphism.py
passes。该套件总体上有一些故障:
Ran 2952 tests in 51.311s
FAILED (failures=3, skipped=54)
Test failed: <unittest.runner.TextTestResult run=2952 errors=0 failures=3>
在 Python 2.7.12 上是:
Ran 2952 tests in 88.915s
OK (skipped=54)
你能使用 nauty(http://users.cecs.anu.edu.au/~bdm/nauty/,在 linux 发行版中可用)吗?它有一个规范的标签算法,速度快,可能适用于您的问题。规范标记使同构图相同(规范化)。例如,使用来自一组随机图的 graph6 格式输出给出以下同构图计数
$ cat g6.py
import networkx as nx
for i in range(100000):
print(nx.generate_graph6(nx.fast_gnp_random_graph(4,0.2),header=False))
$ python g6.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 100000 graphs labelled from stdin to stdout in 0.21 sec.
4898 C`
167 C^
10 C~
26408 C?
39392 C@
19684 CB
1575 CF
1608 CJ
1170 CN
288 Cr
4800 CR
这是4个节点的11张图-
$ cat atlas.py
import networkx as nx
for g in nx.atlas.graph_atlas_g()[8:19]:
print(nx.generate_graph6(g,header=False))
$ python atlas.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 11 graphs labelled from stdin to stdout in 0.00 sec.
1 C`
1 C^
1 C~
1 C?
1 C@
1 CB
1 CF
1 CJ
1 CN
1 Cr
1 CR
如果运行速度太慢,并行化该方法会很容易。
正如其他人提到的,如果您想留在 Python + Networkx,您可以使用 could_be_isomorphic
来过滤您的图表。
问题在于此方法需要 2 个图形作为输入,而不是数百万个。如果用这种方法比较每一对图,时间会很长。
查看 sourcecode of could_be_isomorphic
,它比较了两个图的度数、三角形和派系序列数。如果它们不相等,则图不能同构。
您可以将此指纹打包到一个函数中,根据此指纹对图形进行排序,然后将它们与 itertools.groupby
分组。将会有绝大多数孤立的图表。然后可以检查具有相同指纹的少数图是否同构。
使用 100 000 个随机图的列表:
many_graphs = [nx.fast_gnp_random_graph(random.randint(16, 22), 0.2) for _ in range(100000)]
只有 500 个指纹被至少 2 个图表共享。如果加上边缘类型信息,那么常见的指纹就更少了。
这是一个包含 3000 个图的示例,每个图有 10 到 14 个节点:
import networkx as nx
from itertools import groupby
import random
many_graphs = [nx.fast_gnp_random_graph(
random.randint(10, 14), 0.3) for _ in range(3000)]
def graph_fingerprint(g):
order = g.order()
d = g.degree()
t = nx.triangles(g)
c = nx.number_of_cliques(g)
props = [[d[v], t[v], c[v]] for v in d]
props.sort()
# TODO: Add count of edge types.
return(props)
sorted_graphs = sorted(many_graphs, key=graph_fingerprint)
for f, g in groupby(sorted_graphs, key=graph_fingerprint):
similar_graphs = list(g)
n = len(similar_graphs)
if n > 1:
print("Found %d graphs which could be isomorphic." % n)
for i in range(n):
for j in range(i + 1, n):
g1, g2 = similar_graphs[i], similar_graphs[j]
if g1 != g2 and nx.is_isomorphic(g1, g2):
print(" %s and %s are isomorphic!" %
(nx.generate_graph6(g1,header=False), nx.generate_graph6(g2,header=False)))
不到1秒就找到4个同构对:
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Id?OGG_C? and IaKO_@?@? are isomorphic!
Found 6 graphs which could be isomorphic.
I?OWcGG?G and I?OCSa?@_ are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
I_uI???JG and II??QDNA? are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
IDOCCY@GG and IOGC@`dS? are isomorphic!
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
这是最后两个同构图。 "IDOCCY@GG":
和"IOGC@\dS?":
这里有 2 个具有相同指纹但不同构的图:
指纹识别可以并行完成。排序和分组必须在 1 CPU 上进行,但是每个组的同构检查可以在不同的 CPU 中完成。
我正在比较一大组 networkx 图的同构性,其中大多数图不应该是同构的(例如,假设 0-20% 与列表中的某些东西同构)。
我试过下面的方法。
graphs = [] # A list of networkx graphs
unique = [] # A list of unique graphs
for new in graphs:
for old in unique:
if nx.is_isomorphic(new, old[0]):
break
else:
unique.append([new])
这让我得到了一个更快的缩减集,但我仍然觉得它太慢了,不适合理想的使用。是否有一些更快的算法来处理此类问题(比较传递交换属性对)或将此算法扩展到多核设置的方法(运行 在 20 核机器上)。
我已经根据节点/边的数量过滤这些数据集,我们可以假设 nx.is_isomorphic 函数不能通过任何过滤类型的操作变得更快。我现在也不能轻易更改工具,所以使用编译包不是一个选项。
附加信息:
图往往大约有 16-20 个节点,总共有 24-48 条边,有很多互连,因此每个节点大约有 8 条边。每条边也被标记,但只有 2-3 种类型的边被使用过。
您可以在 PyPy 上试用您的代码,它为纯 Python 代码提供即时编译。对于可能的性能提升,他们说...
...depends greatly on the type of task being performed. The geometric average of all benchmarks is 0.13 or 7.5 times faster than CPython
如果您的工作负载是 CPU-bound(看起来是这样)并且 Python 过程很长-运行(因此可以执行 JIT 编译)那么性能提升可以重要的。 NetworkX 是纯 Python(它有像 numpy 这样的可选依赖项,但它们是额外功能所必需的),特别是 isomorph
模块。我尝试了 PyPy 5.7.1 和 networkx/algorithms/isomorphism/tests/test_isomorphism.py
passes。该套件总体上有一些故障:
Ran 2952 tests in 51.311s
FAILED (failures=3, skipped=54)
Test failed: <unittest.runner.TextTestResult run=2952 errors=0 failures=3>
在 Python 2.7.12 上是:
Ran 2952 tests in 88.915s
OK (skipped=54)
你能使用 nauty(http://users.cecs.anu.edu.au/~bdm/nauty/,在 linux 发行版中可用)吗?它有一个规范的标签算法,速度快,可能适用于您的问题。规范标记使同构图相同(规范化)。例如,使用来自一组随机图的 graph6 格式输出给出以下同构图计数
$ cat g6.py
import networkx as nx
for i in range(100000):
print(nx.generate_graph6(nx.fast_gnp_random_graph(4,0.2),header=False))
$ python g6.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 100000 graphs labelled from stdin to stdout in 0.21 sec.
4898 C`
167 C^
10 C~
26408 C?
39392 C@
19684 CB
1575 CF
1608 CJ
1170 CN
288 Cr
4800 CR
这是4个节点的11张图-
$ cat atlas.py
import networkx as nx
for g in nx.atlas.graph_atlas_g()[8:19]:
print(nx.generate_graph6(g,header=False))
$ python atlas.py |nauty-labelg |sort |uniq -c
>A labelg
>Z 11 graphs labelled from stdin to stdout in 0.00 sec.
1 C`
1 C^
1 C~
1 C?
1 C@
1 CB
1 CF
1 CJ
1 CN
1 Cr
1 CR
如果运行速度太慢,并行化该方法会很容易。
正如其他人提到的,如果您想留在 Python + Networkx,您可以使用 could_be_isomorphic
来过滤您的图表。
问题在于此方法需要 2 个图形作为输入,而不是数百万个。如果用这种方法比较每一对图,时间会很长。
查看 sourcecode of could_be_isomorphic
,它比较了两个图的度数、三角形和派系序列数。如果它们不相等,则图不能同构。
您可以将此指纹打包到一个函数中,根据此指纹对图形进行排序,然后将它们与 itertools.groupby
分组。将会有绝大多数孤立的图表。然后可以检查具有相同指纹的少数图是否同构。
使用 100 000 个随机图的列表:
many_graphs = [nx.fast_gnp_random_graph(random.randint(16, 22), 0.2) for _ in range(100000)]
只有 500 个指纹被至少 2 个图表共享。如果加上边缘类型信息,那么常见的指纹就更少了。
这是一个包含 3000 个图的示例,每个图有 10 到 14 个节点:
import networkx as nx
from itertools import groupby
import random
many_graphs = [nx.fast_gnp_random_graph(
random.randint(10, 14), 0.3) for _ in range(3000)]
def graph_fingerprint(g):
order = g.order()
d = g.degree()
t = nx.triangles(g)
c = nx.number_of_cliques(g)
props = [[d[v], t[v], c[v]] for v in d]
props.sort()
# TODO: Add count of edge types.
return(props)
sorted_graphs = sorted(many_graphs, key=graph_fingerprint)
for f, g in groupby(sorted_graphs, key=graph_fingerprint):
similar_graphs = list(g)
n = len(similar_graphs)
if n > 1:
print("Found %d graphs which could be isomorphic." % n)
for i in range(n):
for j in range(i + 1, n):
g1, g2 = similar_graphs[i], similar_graphs[j]
if g1 != g2 and nx.is_isomorphic(g1, g2):
print(" %s and %s are isomorphic!" %
(nx.generate_graph6(g1,header=False), nx.generate_graph6(g2,header=False)))
不到1秒就找到4个同构对:
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Id?OGG_C? and IaKO_@?@? are isomorphic!
Found 6 graphs which could be isomorphic.
I?OWcGG?G and I?OCSa?@_ are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
I_uI???JG and II??QDNA? are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
IDOCCY@GG and IOGC@`dS? are isomorphic!
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
这是最后两个同构图。 "IDOCCY@GG":
和"IOGC@\dS?":
这里有 2 个具有相同指纹但不同构的图:
指纹识别可以并行完成。排序和分组必须在 1 CPU 上进行,但是每个组的同构检查可以在不同的 CPU 中完成。