有向图中的最大公共子图
Maximum Common Subgraph in a Directed Graph
我试图将一组句子表示为有向图,其中一个词由一个节点表示。如果重复一个词,则不重复该节点,而是使用先前存在的节点。我们称此图为 MainG
.
接着,我取一个新句子,创建这个句子的有向图(称这个图为SubG
),然后在[=11=中寻找SubG
的最大公共子图].
我在 Python 3.5 中使用 NetworkX api。我知道这是正常图的 NP 完全问题,但对于有向图来说这是一个线性问题。我提到的链接之一:
How can I find Maximum Common Subgraph of two graphs?
我尝试执行以下代码:
import networkx as nx
import pandas as pd
import nltk
class GraphTraversal:
def createGraph(self, sentences):
DG=nx.DiGraph()
tokens = nltk.word_tokenize(sentences)
token_count = len(tokens)
for i in range(token_count):
if i == 0:
continue
DG.add_edges_from([(tokens[i-1], tokens[i])], weight=1)
return DG
def getMCS(self, G_source, G_new):
"""
Creator: Bonson
Return the MCS of the G_new graph that is present
in the G_source graph
"""
order = nx.topological_sort(G_new)
print("##### topological sort #####")
print(order)
objSubGraph = nx.DiGraph()
for i in range(len(order)-1):
if G_source.nodes().__contains__(order[i]) and G_source.nodes().__contains__(order[i+1]):
print("Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
objSubGraph.add_node(order[i])
objSubGraph.add_node(order[i+1])
objSubGraph.add_edge(order[i], order[i+1])
else:
print("Does Not Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
continue
obj_graph_traversal = GraphTraversal()
SourceSentences = "A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ."
SourceGraph = obj_graph_traversal.createGraph(SourceSentences)
TestSentence_1 = "not much of a story" #ThisWorks
TestSentence_1 = "not much of a story of what is good" #This DOES NOT Work
TestGraph = obj_graph_traversal.createGraph(TestSentence_1)
obj_graph_traversal.getMCS(SourceGraph, TestGraph)
因为我正在尝试进行拓扑排序,所以第二个不起作用。
有兴趣了解可能的方法。
以下代码从有向图中获取最大公共子图:
def getMCS(self, G_source, G_new):
matching_graph=nx.Graph()
for n1,n2,attr in G_new.edges(data=True):
if G_source.has_edge(n1,n2) :
matching_graph.add_edge(n1,n2,weight=1)
graphs = list(nx.connected_component_subgraphs(matching_graph))
mcs_length = 0
mcs_graph = nx.Graph()
for i, graph in enumerate(graphs):
if len(graph.nodes()) > mcs_length:
mcs_length = len(graph.nodes())
mcs_graph = graph
return mcs_graph
Bonson 答案的编辑队列已满,但它不再适用于 networkx 2.4,并且有一些可能的改进:
connected_component_subgraphs
已在 networkx 2.4 中删除,connected_components
应该使用 return 一组节点。
因为只有节点数是求最大分量所以可以大大简化。
这不再是专门针对最初的问题量身定制的,因为如果搜索 "Maximum Common Subgraph in a Directed Graph",这是最好的选择,我需要它来寻找完全不同的东西
我的改编版本是:
def getMCS(g1, g2):
matching_graph=networkx.Graph()
for n1,n2 in g2.edges():
if g1.has_edge(n1, n2):
matching_graph.add_edge(n1, n2)
components = networkx.connected_components(matching_graph)
largest_component = max(components, key=len)
return networkx.induced_subgraph(matching_graph, largest_component)
如果最后一行被替换为 return networkx.induced_subgraph(g1, largest_component)
它应该也能正常工作并且 return 一个有向图。
我试图将一组句子表示为有向图,其中一个词由一个节点表示。如果重复一个词,则不重复该节点,而是使用先前存在的节点。我们称此图为 MainG
.
接着,我取一个新句子,创建这个句子的有向图(称这个图为SubG
),然后在[=11=中寻找SubG
的最大公共子图].
我在 Python 3.5 中使用 NetworkX api。我知道这是正常图的 NP 完全问题,但对于有向图来说这是一个线性问题。我提到的链接之一:
How can I find Maximum Common Subgraph of two graphs?
我尝试执行以下代码:
import networkx as nx
import pandas as pd
import nltk
class GraphTraversal:
def createGraph(self, sentences):
DG=nx.DiGraph()
tokens = nltk.word_tokenize(sentences)
token_count = len(tokens)
for i in range(token_count):
if i == 0:
continue
DG.add_edges_from([(tokens[i-1], tokens[i])], weight=1)
return DG
def getMCS(self, G_source, G_new):
"""
Creator: Bonson
Return the MCS of the G_new graph that is present
in the G_source graph
"""
order = nx.topological_sort(G_new)
print("##### topological sort #####")
print(order)
objSubGraph = nx.DiGraph()
for i in range(len(order)-1):
if G_source.nodes().__contains__(order[i]) and G_source.nodes().__contains__(order[i+1]):
print("Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
objSubGraph.add_node(order[i])
objSubGraph.add_node(order[i+1])
objSubGraph.add_edge(order[i], order[i+1])
else:
print("Does Not Contains Nodes {0} -> {1} ".format(order[i], order[i+1]))
continue
obj_graph_traversal = GraphTraversal()
SourceSentences = "A series of escapades demonstrating the adage that what is good for the goose is also good for the gander , some of which occasionally amuses but none of which amounts to much of a story ."
SourceGraph = obj_graph_traversal.createGraph(SourceSentences)
TestSentence_1 = "not much of a story" #ThisWorks
TestSentence_1 = "not much of a story of what is good" #This DOES NOT Work
TestGraph = obj_graph_traversal.createGraph(TestSentence_1)
obj_graph_traversal.getMCS(SourceGraph, TestGraph)
因为我正在尝试进行拓扑排序,所以第二个不起作用。
有兴趣了解可能的方法。
以下代码从有向图中获取最大公共子图:
def getMCS(self, G_source, G_new):
matching_graph=nx.Graph()
for n1,n2,attr in G_new.edges(data=True):
if G_source.has_edge(n1,n2) :
matching_graph.add_edge(n1,n2,weight=1)
graphs = list(nx.connected_component_subgraphs(matching_graph))
mcs_length = 0
mcs_graph = nx.Graph()
for i, graph in enumerate(graphs):
if len(graph.nodes()) > mcs_length:
mcs_length = len(graph.nodes())
mcs_graph = graph
return mcs_graph
Bonson 答案的编辑队列已满,但它不再适用于 networkx 2.4,并且有一些可能的改进:
connected_component_subgraphs
已在 networkx 2.4 中删除,connected_components
应该使用 return 一组节点。因为只有节点数是求最大分量所以可以大大简化。
这不再是专门针对最初的问题量身定制的,因为如果搜索 "Maximum Common Subgraph in a Directed Graph",这是最好的选择,我需要它来寻找完全不同的东西
我的改编版本是:
def getMCS(g1, g2):
matching_graph=networkx.Graph()
for n1,n2 in g2.edges():
if g1.has_edge(n1, n2):
matching_graph.add_edge(n1, n2)
components = networkx.connected_components(matching_graph)
largest_component = max(components, key=len)
return networkx.induced_subgraph(matching_graph, largest_component)
如果最后一行被替换为 return networkx.induced_subgraph(g1, largest_component)
它应该也能正常工作并且 return 一个有向图。