如何将 Python 中的加权边缘列表转换为通用边缘列表？

Question

我有一个数据的加权边缘列表。它由连接的源、目标和权重组成。像这样：

  source destination  weight
0      A           B       3
1      A           C       2
2      A           D       3

我想用不包含权重值的通用格式。原因是我正在使用的应用程序没有考虑数据集中的权重值。像这样：

  source destination
0      A           B
1      A           B
2      A           B
3      A           C
4      A           C
5      A           D
6      A           D
7      A           D

我试过使用 reset_index() 和 unstack() 但我得到的结果与我需要的完全不同。有什么建议吗？

Answer 1

使用生成器函数巧妙地完成了。为了简单起见，假设数据是一个三元组列表（源、目标、权重）。

def weighted_to_general(edges):
    for source, destination, weight in edges:
        # Memory optimization: store the tuple only once
        source_destination = (source, destination)
        for n in range(weight):
            yield source_destination


data = [
    ('A', 'B', 3),
    ('A', 'C', 2),
    ('B', 'D', 3),
]

for source_destination in weighted_to_general(data):
    print(source_destination)

如果您需要一个列表，只需使用 list():

迭代生成器

general_data = list(weighted_to_general(data))

Answer 2

你可以试试：

df = pd.DataFrame({'source': ['A', 'A', 'B'], 'destination': ['B', 'C', 'D'], 'weight': [3, 2, 3]})

result = list()
for index, row in df.iterrows():
    for x in range(row.weight):
        result.append([row.source, row.destination])
print(pd.DataFrame(result, columns=['source', 'destination']))

结果：

  source destination
0      A           B
1      A           B
2      A           B
3      A           C
4      A           C
5      B           D
6      B           D
7      B           D

Answer 3

您可以使用 pd.Index.repeat() 并传递 weight 列以获得该重复次数，然后在 df.loc[]:

下调用

df.loc[df.index.repeat(df.weight),['source','destination']].reset_index(drop=True)

替代代码 np.repeat():

final=(pd.DataFrame(np.repeat(df[['source','destination']].values,
  df.weight,axis=0),columns=['source','destination']))

  source destination
0      A           B
1      A           B
2      A           B
3      A           C
4      A           C
5      A           D
6      A           D
7      A           D

如何将 Python 中的加权边缘列表转换为通用边缘列表？

How do i convert a weighted edglist in Python to a general edgelist?

python

pandas

edge-list