Python 根据匹配 key/value 对减少字典列表

Question

我有一个指定流的字典列表（从源跳到目的地及其各自的音量）。现在我想将这些流量拆分为 link（例如（源到流量跳转，跳到目的地流量）并通过汇总它们的流量将所有重复的 link 合并在一起。

因为我是 python 的新手，所以我想知道什么是好的方法。我的第一种方法是遍历所有流并在内部嵌套一个遍历所有 link 的循环，并检查 link 是否已经存在。

但如果我有数百万个流，我想这可能会变得非常低效和缓慢。

我的起始数据是这样的：

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200,
    },
]

我的结果应该是：

links = [
    {
        'source': 1,
        'hop': 2,
        'volume': 150,
    },{
        'hop': 2,
        'destination': 3,
        'volume': 100,
    },{
        'hop': 2,
        'destination': 4,
        'volume': 250,
    },{
        'source': 2,
        'hop': 2,
        'volume': 200,
    },
]

非常感谢您的帮助！

Answer 1

伪算法：

创建一个空结果 list/set/dictionary
循环流列表
将每个流拆分为 2 个链接
对于这 2 个链接中的每一个，测试它们是否已经在结果列表中（基于 2 个节点）。
如果没有：添加它们。如果是：升级列表中已有的音量。

Answer 2

您可以收集指向两个不同词典的链接，一个在源和跃点之间，另一个在跃点和目标之间。然后您可以轻松地从两个字典中单独创建结果列表。 Counter 下方使用了 dict 类似默认值为 0 的对象：

import pprint
from collections import Counter

flows = [
    {
        'source': 1,
        'hop': 2,
        'destination': 3,
        'volume': 100.5,
    },{
        'source': 1,
        'hop': 2,
        'destination': 4,
        'volume': 50,
    },{
        'source': 2,
        'hop': 2,
        'destination': 4,
        'volume': 200.7,
    },
]

sources = Counter()
hops = Counter()

for f in flows:
    sources[f['source'], f['hop']] += f['volume']
    hops[f['hop'], f['destination']] += f['volume']

res = [{'source': source, 'hop': hop, 'volume': vol} for (source, hop), vol in sources.items()]
res.extend([{'hop': hop, 'destination': dest, 'volume': vol} for (hop, dest), vol in hops.items()])
pprint.pprint(res)

输出：

[{'hop': 2, 'source': 1, 'volume': 150.5},
 {'hop': 2, 'source': 2, 'volume': 200.7},
 {'destination': 3, 'hop': 2, 'volume': 100.5},
 {'destination': 4, 'hop': 2, 'volume': 250.7}]

以上运行需要 O(n) 时间，因此如果您有足够的内存，它应该可以处理数百万个流。

Python 根据匹配 key/value 对减少字典列表

Python Reduce List of Dicts based on matching key/value pairs

python

merge

key-value