删除 networkx 中权重在特定范围内的边

Question

我有一个数据框。我将该数据框转换为图表。之后，我想去除特定重量范围内的边缘：

列：

df.columns:
Index(['source', 'target', 'weight'], dtype='object')

长度：

len(df)
1048575

数据类型：

df.dtypes
source      int64
target    float64
weight      int64
dtype: object

现在，构建一个 networkx 图：

Graphtype = nx.Graph()
G = nx.from_pandas_edgelist(df, 'source','target', edge_attr='weight', create_using=Graphtype)

图表信息：

print(nx.info(G))
Name: 
Type: Graph
Number of nodes: 609627
Number of edges: 915549
Average degree:   3.0036

度数:

degrees = sorted(G.degree, key=lambda x: x[1], reverse=True)
degrees
[(a, 1111),
 (c, 1107),
 (f, 836),
 (g, 722),
 (h, 608),
 (k, 600),
 (r, 582),
 (z, 557),
 (l, 417), etc....

我想做的是删除具有特定权重的边缘。例如，我想删除所有权重 > = 500

的边

to_remove = [(a,b) for a,b in G.edges(data=True) if "weight" >= 500]
G.remove_edges_from(to_remove)

但是，我收到以下错误消息：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-df3a67f18df9> in <module>()
----> 1 to_remove = [(a,b) for a,b in G.edges(data=True) if "weight" >=500]

<ipython-input-12-df3a67f18df9> in <listcomp>(.0)
----> 1 to_remove = [(a,b) for a,b in G.edges(data=True) if "weight" >=500]

ValueError: too many values to unpack (expected 2)

知道为什么我会收到这条消息吗？或者也许有更好的方法来做到这一点？谢谢

Answer 1

我是这样解决这个问题的：

threshold = 500

# filter out all edges above threshold and grab id's
long_edges = list(filter(lambda e: e[2] > threshold, (e for e in G.edges.data('weight'))))
le_ids = list(e[:2] for e in long_edges)

# remove filtered edges from graph G
G.remove_edges_from(le_ids)

Answer 2

在这种理解中，您将字符串 "weight" 与 int 进行比较，这没有多大意义：

[(a,b) for a,b in G.edges(data=True) if "weight" >= 500]

此外，真正导致异常的原因是，如果您传递 data=True，您将得到一个 3 元组，其中第三个元素是属性字典。

您可能想要做的是：

[(a,b) for a, b, attrs in G.edges(data=True) if attrs["weight"] >= 500]

删除 networkx 中权重在特定范围内的边

Remove edges having a weight in a specific range in networkx

python

networkx

pandas