Python 生成器在附加到列表时创建 yield 结果的副本

Python generator creates duplicates of yield results when appended to a list

我的生成器函数 "returns" / 产生我想要的结果(打印时),但是如果我将结果附加到一个大列表,列表中会出现许多重复的结果。为什么?我怎样才能避免这种情况?

输入:

input = [
[
    {'orgunit': '013107','attr1': 2, 'attr2': 3},
    {'orgunit': '013113','attr1': 20, 'attr3': 30},
],
    [{...}]
]

其中输入的 dict 共享公共 orgunit。 如果我打印 isolated:

目标(但将每个元素附加到列表中):

>>> print isolated
{'dataElement': 'attr1', 'value': '2', 'orgunit': 013107}
{'dataElement': 'attr2', 'value': '3', 'orgunit': 013107}
{'dataElement': 'attr1', 'value': '20', 'orgunit': 013113}    
{'dataElement': 'attr3', 'value': '30', 'orgunit': 013113}

方法和生成器:

def transform(input):
    values = []
    gen = process_event(input)
    for event in gen:
        values.append(event)
    # print
    print values

def process_event(input):
    for i in xrange(len(input)):
        for event in input[i]:
            isolated = {}
            isolated['orgunit'] = event['orgunit']
            for key, value in event.copy().iteritems():
                isolated['dataElement'] = key
                isolated['value'] = value
                # print
                print isolated
                yield isolated

我得到的是:

>>>print values
    {
        "dataElement": "attr1", 
        "value": 2, 
        "orgunit": "013107"
    }, 
    {
        "dataElement": "attr1", 
        "value": 2, 
        "orgunit": "013107"
    }...

看来你修改的是同一个字典,所以第一个和第二个yield返回的项在引用的含义上是相同的。简单的解决方案是在更深的 FOR 内部而不是外部创建新的隔离字典。

所以您的代码的正确版本是:

def process_event(input):
    for events_list in input:  # It's more pythonic way to iterate over items, not by indices
        for event in events_list:
            orgunit = event['orgunit'] # Save current orgunit
            del event['orgunit'] # You will get "dataElement: orgunit" without this. Also you can make a copy and delete from a copy to do not corrupt input list
            for key, value in event.iteritems(): # You do not need copy here
                isolated = {'orgunit': orgunit} # The main point - each time create new isolated dict
                isolated['dataElement'] = key
                isolated['value'] = value
                # print
                print isolated
                yield isolated

并且输出:

>>>input = [[
    {'orgunit': '013107','attr1': 2, 'attr2': 3},
    {'orgunit': '013113','attr1': 20, 'attr3': 30},   ]]    
>>>transform(input)


{'dataElement': 'attr1', 'value': 2, 'orgunit': '013107'}
{'dataElement': 'attr2', 'value': 3, 'orgunit': '013107'}
{'dataElement': 'attr1', 'value': 20, 'orgunit': '013113'}
{'dataElement': 'attr3', 'value': 30, 'orgunit': '013113'}