Python 生成器在附加到列表时创建 yield 结果的副本
Python generator creates duplicates of yield results when appended to a list
我的生成器函数 "returns" / 产生我想要的结果(打印时),但是如果我将结果附加到一个大列表,列表中会出现许多重复的结果。为什么?我怎样才能避免这种情况?
输入:
input = [
[
{'orgunit': '013107','attr1': 2, 'attr2': 3},
{'orgunit': '013113','attr1': 20, 'attr3': 30},
],
[{...}]
]
其中输入的 dict
共享公共 orgunit
。
如果我打印 isolated
:
目标(但将每个元素附加到列表中):
>>> print isolated
{'dataElement': 'attr1', 'value': '2', 'orgunit': 013107}
{'dataElement': 'attr2', 'value': '3', 'orgunit': 013107}
{'dataElement': 'attr1', 'value': '20', 'orgunit': 013113}
{'dataElement': 'attr3', 'value': '30', 'orgunit': 013113}
方法和生成器:
def transform(input):
values = []
gen = process_event(input)
for event in gen:
values.append(event)
# print
print values
def process_event(input):
for i in xrange(len(input)):
for event in input[i]:
isolated = {}
isolated['orgunit'] = event['orgunit']
for key, value in event.copy().iteritems():
isolated['dataElement'] = key
isolated['value'] = value
# print
print isolated
yield isolated
我得到的是:
>>>print values
{
"dataElement": "attr1",
"value": 2,
"orgunit": "013107"
},
{
"dataElement": "attr1",
"value": 2,
"orgunit": "013107"
}...
看来你修改的是同一个字典,所以第一个和第二个yield返回的项在引用的含义上是相同的。简单的解决方案是在更深的 FOR 内部而不是外部创建新的隔离字典。
所以您的代码的正确版本是:
def process_event(input):
for events_list in input: # It's more pythonic way to iterate over items, not by indices
for event in events_list:
orgunit = event['orgunit'] # Save current orgunit
del event['orgunit'] # You will get "dataElement: orgunit" without this. Also you can make a copy and delete from a copy to do not corrupt input list
for key, value in event.iteritems(): # You do not need copy here
isolated = {'orgunit': orgunit} # The main point - each time create new isolated dict
isolated['dataElement'] = key
isolated['value'] = value
# print
print isolated
yield isolated
并且输出:
>>>input = [[
{'orgunit': '013107','attr1': 2, 'attr2': 3},
{'orgunit': '013113','attr1': 20, 'attr3': 30}, ]]
>>>transform(input)
{'dataElement': 'attr1', 'value': 2, 'orgunit': '013107'}
{'dataElement': 'attr2', 'value': 3, 'orgunit': '013107'}
{'dataElement': 'attr1', 'value': 20, 'orgunit': '013113'}
{'dataElement': 'attr3', 'value': 30, 'orgunit': '013113'}
我的生成器函数 "returns" / 产生我想要的结果(打印时),但是如果我将结果附加到一个大列表,列表中会出现许多重复的结果。为什么?我怎样才能避免这种情况?
输入:
input = [
[
{'orgunit': '013107','attr1': 2, 'attr2': 3},
{'orgunit': '013113','attr1': 20, 'attr3': 30},
],
[{...}]
]
其中输入的 dict
共享公共 orgunit
。
如果我打印 isolated
:
目标(但将每个元素附加到列表中):
>>> print isolated
{'dataElement': 'attr1', 'value': '2', 'orgunit': 013107}
{'dataElement': 'attr2', 'value': '3', 'orgunit': 013107}
{'dataElement': 'attr1', 'value': '20', 'orgunit': 013113}
{'dataElement': 'attr3', 'value': '30', 'orgunit': 013113}
方法和生成器:
def transform(input):
values = []
gen = process_event(input)
for event in gen:
values.append(event)
# print
print values
def process_event(input):
for i in xrange(len(input)):
for event in input[i]:
isolated = {}
isolated['orgunit'] = event['orgunit']
for key, value in event.copy().iteritems():
isolated['dataElement'] = key
isolated['value'] = value
# print
print isolated
yield isolated
我得到的是:
>>>print values
{
"dataElement": "attr1",
"value": 2,
"orgunit": "013107"
},
{
"dataElement": "attr1",
"value": 2,
"orgunit": "013107"
}...
看来你修改的是同一个字典,所以第一个和第二个yield返回的项在引用的含义上是相同的。简单的解决方案是在更深的 FOR 内部而不是外部创建新的隔离字典。
所以您的代码的正确版本是:
def process_event(input):
for events_list in input: # It's more pythonic way to iterate over items, not by indices
for event in events_list:
orgunit = event['orgunit'] # Save current orgunit
del event['orgunit'] # You will get "dataElement: orgunit" without this. Also you can make a copy and delete from a copy to do not corrupt input list
for key, value in event.iteritems(): # You do not need copy here
isolated = {'orgunit': orgunit} # The main point - each time create new isolated dict
isolated['dataElement'] = key
isolated['value'] = value
# print
print isolated
yield isolated
并且输出:
>>>input = [[
{'orgunit': '013107','attr1': 2, 'attr2': 3},
{'orgunit': '013113','attr1': 20, 'attr3': 30}, ]]
>>>transform(input)
{'dataElement': 'attr1', 'value': 2, 'orgunit': '013107'}
{'dataElement': 'attr2', 'value': 3, 'orgunit': '013107'}
{'dataElement': 'attr1', 'value': 20, 'orgunit': '013113'}
{'dataElement': 'attr3', 'value': 30, 'orgunit': '013113'}