APGL (Another Python Graph Library) - 在广度优先搜索后设置子图大小

Question

我一直在使用apgl 0.8.1。库来分析一个庞大的网络（4000 万个节点）。我尝试了 apgl，因为它适用于 scipy 稀疏矩阵，现在我可以将稀疏矩阵加载到内存中以执行分析。在广度优先搜索分析之后，我有兴趣获得具有所需大小的所有网络的子图。

我正在从 pandas 中读取邻接列表来构建稀疏矩阵。考虑这个名为 network:

的示例网络 (Node,Node,Weight)

这是我使用的示例代码：

# Import Modules
import pandas as pd
import numpy as np
import scipy as sp
from apgl.graph import SparseGraph

# Load Network
df = pd.read_csv(network,sep='\s+',header=None,names=['User1','User2','W'])

# Convert the numpy array from pandas into a NxN square matrix
# Read Numpy array from Pandas
arr = df.values

# Set matrix shapes
shapes = tuple(arr.max(axis=0)[:2]+1)

# Build a Sparse Matrix
matrix = sp.sparse.csr_matrix((arr[:, 2], (arr[:, 0], arr[:, 1])), 
                        shape=shape,
                        dtype=arr.dtype)

# Set the total number of nodes
numVertices = shape[0]

# Inizialize Graph
graph = SparseGraph(numVertices, undirected=False, W=matrix, frmt='csr')

# Perform BFS starting from one one --> set output to np.array
startingnode = 5
bfs = np.array(graph.breadthFirstSearch(startingnode))

# Return SubGraph with a list of nodes
# Set limit
limit = 5
subgraph = graph.subgraph(bfs[:limit])

哪个returns:

bfs = [ 5  1  2  6 14 11  3  7 12  4  8 13]
subgraph = SparseGraph: vertices 5, edges 6, directed, vertex storage GeneralVertexList, edge storage <class 'scipy.sparse.csr.csr_matrix'>

所以我将结果子图的节点限制为 5 个。但是无论搜索的形状如何，节点都是从第一个到第五个中选择的。我的意思是，广度优先搜索算法正在寻找邻居的邻居等等，我想设置一个限制，其中包括在最后选择的节点的所有最终邻居级别的搜索。在示例中，子图包含 BFS 数组的前五个节点，因此：

子图 = [5 1 2 6 14]

但我还想包括节点 7（完成从节点 5 开始的邻居级别）和节点 3（完成节点 2 级别的搜索）。所以得到的节点数组应该是：

子图 = [5 1 2 3 6 7 14]

如有任何帮助，我们将不胜感激。

编辑：我想做的是从一个随机节点开始找到整个图的子图，并执行 BFS 算法直到子图的不同大小，例如500万、1000万和2000万个节点。我想在停止之前完成每个级别的节点邻居，所以无论节点数量是500万还是500万和100，如果需要最后100个节点来完成所有邻居的级别在搜索中找到的最后一个节点。

Answer 1

BFS 和 DFS 在无向图上运行，这些图可以是无环的（即树）也可以不是。

在运行过程中，两种算法都可能遇到回退边缘。也就是说，如果遍历这些边，算法就会考虑已经访问过的节点。

BFS 或 DFS 的输出（通常）不会报告这些边及其末端的节点。

Networkx (https://networkx.github.io/) contains dfs_labeled_edges 其中 returns 一个迭代器，每条边都有一个特征。

就您的代码而言：

limit = 5
subgraph = graph.subgraph(bfs[:limit])

这不会搜索 'length' 5 的所有 BFS 子图。由于 bfs[:5] 这将始终查看 BFS 输出的前 5 个条目。

如果您正在寻找循环，也许您可以使用不同的算法（For example, this is again from Networkx), or extract your subgraph from the original network and then use DFS to label its edges, or enumerate all of its simplest paths 并尝试对其进行处理。

希望这对您有所帮助。

补充信息：

（我在这里也考虑了你这个老问题：）

您想从主图G(V,E) 中提取一个适当的子图U(W,C)，其中U 的阶数远小于G 的阶数(|U|<<|G| ) 并且 U 是从 G(V,E) 的某个节点 v_i 开始的 G 的 BFS。

有两种方法可以做到这一点：

编写您自己的 BFS，您可以在其中添加深度计数器遍历和遍历的节点，并使用它们来中断任何你喜欢的算法。由于您拥有的节点数量非常多，因此您应该研究算法的迭代版本而不是递归版本。更多信息，请参阅：Way to go from recursion to iteration and also to some extent http://en.wikipedia.org/wiki/Depth-limited_search。这种方法会更有效，因为 BFS 不必遍历整个图。
截断现有 BFS 的输出并使用剩余节点作为下一步的起点。

在任何一种情况下，您的算法都将包含一个迭代步骤，最终看起来像这样（这里是选项 #2）：

#Given graph G(V,E) and NNodeLimit (Natural number)
#Produce a set Q of BFS proper subgraphs bearing the characteristics of U.
Q = []
nextBFSNode = [0]

while len(nextBFSNode):
    #Pop the node
    startingPoint = nextBFSNode.pop()
    #Build a BFS graph starting from some node
    q = BFS(G, startingPoint)
    #Truncate its output and save it to the list.
    Q.append(subgraph(G,q[:NNodeLimit]))
    #Add the remaining nodes as future starting points
    nextBFSNode = set(q[NNodeLimit+1:])

不同树之间当然会有相当大的重叠。

希望这对您有所帮助。

APGL (Another Python Graph Library) - 在广度优先搜索后设置子图大小

APGL (Another Python Graph Library) - Set subgraph size after a Breadth First Search

python

graph

breadth-first-search

scipy

sparse-matrix