将组 ID 分配给 networkx 中的组件

Question

我有一个图表，该图表由具有 "parentid" 个酒店和 "phone_search" 存储在其中的节点组成。我构建此图的主要目的是连接所有具有相似 "phone_search"（递归）的 "parentid"，例如，如果 parentid A 具有 phone_search 1,2； B 有 2,3； C 有 3,4； D 有 5,6，E 有 6,7，那么 A,B,C 会被分到 1 个簇中，D 和 E 会分到另一个簇中。

这是我构建网络的代码：

from pymongo import MongoClient  # To import client for MongoDB
import networkx as nx
import pickle

G = nx.Graph()

#Defining variables
hotels = []
phones = []
allResult = []
finalResult = []

#dictNx = {}

# Initializing MongoDB client
client = MongoClient()

# Connection
db = client.hotel
collection = db.hotelData

for post in collection.find():
    hotels.append(post)

for hotel in hotels:
    try:
        phones = hotel["phone_search"].split("|")
        for phone in phones:
            if phone == '':
                pass
            else:
                G.add_edge(hotel["parentid"],phone)
    except:
        phones = hotel["phone_search"]
        if phone == '':
            pass
        else:
            G.add_edge(hotel["parentid"],phone)

# nx.write_gml(G,"export.gml")
pickle.dump(G, open('/home/justdial/newHotel/graph.txt', 'w'))

我想做的事情：我想给每个组件分配一个组ID，并存储到字典中，这样我每次都可以轻松地直接从字典。

示例：Gid 1 将包含一些在同一个集群中的 parentids 和 phone_searches。同样，Gid 2 将包含来自另一个集群的节点，依此类推...

我还有一个疑问。使用组 ID 从字典访问节点是否比在 networkx 图上执行 bfs 更快？

Answer 1

您基本上需要一个基于组件（而非集群）的节点列表，这非常简单。你需要 connected_component_subgraphs().

G = nx.caveman_graph(3, 4)  # generate example with 3 components of four members each
components = nx.connected_component_subgraphs(G)

comp_dict = {idx: comp.nodes() for idx, comp in enumerate(components)}
print comp_dict
# {0: [0, 1, 2, 3], 1: [4, 5, 6, 7], 2: [8, 9, 10, 11]}

如果您希望将组件 ID 作为节点属性：

attr = {n: comp_id for comp_id, nodes in comp_dict.items() for n in nodes}

nx.set_node_attributes(G, "component", attr)
print G.nodes(data=True)
# [(0, {'component': 0}), (1, {'component': 0}), (2, {'component': 0}), (3, {'component': 0}), (4, {'component': 1}), (5, {'component': 1}), (6, {'component': 1}), (7, {'component': 1}), (8, {'component': 2}), (9, {'component': 2}), (10, {'component': 2}), (11, {'component': 2})]

Answer 2

这是由于缺乏声誉而发表的评论。

"set_node_attributes" 函数更改了 v1.x 和 v2.0 之间的参数顺序，以允许更多选项来加载属性。顺序是：(G, values, name) 而不是 (G, name, values)

如果使用关键字参数，那么顺序并不重要：

nx.set_node_attributes(G, name='component', values=attr)

将组 ID 分配给 networkx 中的组件

Assigning Group ID to components in networkx

python

grouping

dictionary

networkx