以 COO 格式构建图形连接矩阵
Construct graph connectivity matrices in COO format
我在处理图形数据时遇到了以下子任务:
我需要构建 COO 格式的图连接矩阵,用于具有来自“边界”索引数组的多个完全连接组件的图。
例如,给定数组
borders = [0, 2, 5]
生成的 COO 矩阵应该是
coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
[0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].
也就是说,borders 数组包含 ranges 个应该形成全连接子图的节点(包括起始索引和排除结束索引) .
我提出了以下算法,但我怀疑性能可以提高:
import numpy as np
def get_coo(borders):
edge_list = []
for s, e in zip(borders, borders[1:]):
# create fully-connected subgraph
arr = np.arange(s, e)
t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
t = t.T
edge_list.append(t)
edge_list = np.concatenate(edge_list, axis=1)
return edge_list
我觉得可能有一个更快的解决方案,也许使用一些 numpy 向量化操作。
有没有人有什么想法?
由于您的目标是获得比现有解决方案更快的解决方案,因此您可以探索 itertools
以高效解决此问题。在更大的 border
列表上测试时,此方法的基准测试 比您当前的方法快 大约 25 倍。
import numpy as np
from itertools import product, chain
def get_coo(borders):
edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
return list(edges)
output = get_coo(borders)
## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.
np.array(output).T
array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
[0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])
替代方法和基准:
Note: These have been benchmarked on your current small list as well as on a larger border list as generated by borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]
完全图的不相交并集
您要做的实际上是组合不相交的完整图。此类图的邻接矩阵具有沿其对角线的选择性项目的完整连接。您可以使用 networkx
来解决这些问题。
虽然比您当前的解决方案慢,但您会发现处理这些图形对象比使用 NumPy 表示图形要容易得多,也更有收获。
方法一:
- 假设节点是有序的,计算每个子图中的节点个数为
i
- 创建一个用 1 填充形状
i*i
的完整矩阵
- 使用
nx.disjoint_union_all
合并图表
- 获取此图的边
import numpy as np
import networkx as nx
def get_coo(borders):
graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]
edges = nx.disjoint_union_all(graphs).edges()
return edges
%timeit get_coo(borders)
#Small- 277 µs ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Large- 300 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
方法二:
- 使用
zip
迭代 borders
的滚动 2 元组
- 使用来自这些元组的
ranges(start, end)
创建 nx.complete_graph
- 使用
nx.disjoint_union_all
合并图表
- 获取此图的边
import numpy as np
import networkx as nx
def get_coo(borders):
graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]
edges = nx.disjoint_union_all(graphs).edges()
return edges
%timeit get_coo(borders)
#Small- 116 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 207 ms ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
输出比以前快一点,但缺少节点上的自循环,必须单独添加
使用itertools.product
方法三:
- 使用
zip
迭代 borders
的滚动 2 元组
- 对每个子图使用
itertools.product
个完全连接的边列表
- 使用
itertools.chain
“附加”两个迭代器
- Return 它们作为边缘
import numpy as np
from itertools import product, chain
def get_coo(borders):
edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
return list(edges)
%timeit get_coo(borders)
#Small- 3.91 µs ± 787 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Large- 183 µs ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这种方法比您当前的方法快大约 25 倍
您当前的方法 - 基准测试
def get_coo(borders):
edge_list = []
for s, e in zip(borders, borders[1:]):
# create fully-connected subgraph
arr = np.arange(s, e)
t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
t = t.T
edge_list.append(t)
edge_list = np.concatenate(edge_list, axis=1)
return edge_list
%timeit get_coo(borders)
#Small- 95.1 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 3.91 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
我在处理图形数据时遇到了以下子任务:
我需要构建 COO 格式的图连接矩阵,用于具有来自“边界”索引数组的多个完全连接组件的图。
例如,给定数组
borders = [0, 2, 5]
生成的 COO 矩阵应该是
coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
[0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].
也就是说,borders 数组包含 ranges 个应该形成全连接子图的节点(包括起始索引和排除结束索引) .
我提出了以下算法,但我怀疑性能可以提高:
import numpy as np
def get_coo(borders):
edge_list = []
for s, e in zip(borders, borders[1:]):
# create fully-connected subgraph
arr = np.arange(s, e)
t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
t = t.T
edge_list.append(t)
edge_list = np.concatenate(edge_list, axis=1)
return edge_list
我觉得可能有一个更快的解决方案,也许使用一些 numpy 向量化操作。
有没有人有什么想法?
由于您的目标是获得比现有解决方案更快的解决方案,因此您可以探索 itertools
以高效解决此问题。在更大的 border
列表上测试时,此方法的基准测试 比您当前的方法快 大约 25 倍。
import numpy as np
from itertools import product, chain
def get_coo(borders):
edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
return list(edges)
output = get_coo(borders)
## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.
np.array(output).T
array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
[0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])
替代方法和基准:
Note: These have been benchmarked on your current small list as well as on a larger border list as generated by
borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]
完全图的不相交并集
您要做的实际上是组合不相交的完整图。此类图的邻接矩阵具有沿其对角线的选择性项目的完整连接。您可以使用 networkx
来解决这些问题。
虽然比您当前的解决方案慢,但您会发现处理这些图形对象比使用 NumPy 表示图形要容易得多,也更有收获。
方法一:
- 假设节点是有序的,计算每个子图中的节点个数为
i
- 创建一个用 1 填充形状
i*i
的完整矩阵
- 使用
nx.disjoint_union_all
合并图表
- 获取此图的边
import numpy as np
import networkx as nx
def get_coo(borders):
graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]
edges = nx.disjoint_union_all(graphs).edges()
return edges
%timeit get_coo(borders)
#Small- 277 µs ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Large- 300 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
方法二:
- 使用
zip
迭代 - 使用来自这些元组的
ranges(start, end)
创建nx.complete_graph
- 使用
nx.disjoint_union_all
合并图表
- 获取此图的边
borders
的滚动 2 元组
import numpy as np
import networkx as nx
def get_coo(borders):
graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]
edges = nx.disjoint_union_all(graphs).edges()
return edges
%timeit get_coo(borders)
#Small- 116 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 207 ms ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
输出比以前快一点,但缺少节点上的自循环,必须单独添加
使用itertools.product
方法三:
- 使用
zip
迭代 - 对每个子图使用
itertools.product
个完全连接的边列表 - 使用
itertools.chain
“附加”两个迭代器 - Return 它们作为边缘
borders
的滚动 2 元组
import numpy as np
from itertools import product, chain
def get_coo(borders):
edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
return list(edges)
%timeit get_coo(borders)
#Small- 3.91 µs ± 787 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Large- 183 µs ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
这种方法比您当前的方法快大约 25 倍
您当前的方法 - 基准测试
def get_coo(borders):
edge_list = []
for s, e in zip(borders, borders[1:]):
# create fully-connected subgraph
arr = np.arange(s, e)
t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
t = t.T
edge_list.append(t)
edge_list = np.concatenate(edge_list, axis=1)
return edge_list
%timeit get_coo(borders)
#Small- 95.1 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 3.91 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)