如何根据匹配的 key:value 对在字典列表中组合 N 个字典?
how do I combine N dictionaries in list of dictionaries based on matching key:value pair?
我想实现以下目标。它本质上是 N 个字典的组合或合并,从重复项 id
中累积所有数据,并将多个数据源中所有字典中的所有 values(except id, updated_date)
添加到最终结果中。
class A:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}
class B:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
results = {}
data_sources = [A(),B()]
for data in data_sources:
data_stream = data.run()
for data in data_stream.get('data'):
for key, value in data.items():
if key in ['html_url']:
results.setdefault(key, []).extend(value)
elif key in ['source']:
results.setdefault(key, []).append(value)
else:
results[key] = value
print(results)
期望的输出
[
{
"id":"ID-2002-0201",
"updated_at":"2018-05-14T22:25:51Z",
"html_url":[
"https://github.com/ID-2002-0201",
"https://github.com/ID-2002-0202",
"https://github.com/ID-2002-0203",
"https://github.com/ID-2002-0204"
],
"source": [
"github",
"xxx",
"22aas"
]
},
]
我有点困惑,因为您提供的所需输出与您在代码中提供的示例 类 不匹配。不过,我想我得到了你想要的,如果我不正确地解释了你的问题,请纠正我。
我把你的结果数组当作词典的字典来使用。外部字典包含所有唯一 ID 作为键,内部字典包含您想要在输出中的数据。在循环计算之后,我只是 return list(results.values())
来获得 N 个字典组合的列表。
代码如下:
class A:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}
class B:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
results = {}
data_sources = [A(),B()]
for data in data_sources:
data_stream = data.run()
for data in data_stream.get('data'):
curr_id = data["id"]
result = results.setdefault(curr_id, {})
for key, value in data.items():
if key in ['html_url']:
result.setdefault(key, []).extend(value)
elif key in ['source']:
result.setdefault(key, []).append(value)
else:
result[key] = value
print(list(results.values()))
我想实现以下目标。它本质上是 N 个字典的组合或合并,从重复项 id
中累积所有数据,并将多个数据源中所有字典中的所有 values(except id, updated_date)
添加到最终结果中。
class A:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}
class B:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
results = {}
data_sources = [A(),B()]
for data in data_sources:
data_stream = data.run()
for data in data_stream.get('data'):
for key, value in data.items():
if key in ['html_url']:
results.setdefault(key, []).extend(value)
elif key in ['source']:
results.setdefault(key, []).append(value)
else:
results[key] = value
print(results)
期望的输出
[
{
"id":"ID-2002-0201",
"updated_at":"2018-05-14T22:25:51Z",
"html_url":[
"https://github.com/ID-2002-0201",
"https://github.com/ID-2002-0202",
"https://github.com/ID-2002-0203",
"https://github.com/ID-2002-0204"
],
"source": [
"github",
"xxx",
"22aas"
]
},
]
我有点困惑,因为您提供的所需输出与您在代码中提供的示例 类 不匹配。不过,我想我得到了你想要的,如果我不正确地解释了你的问题,请纠正我。
我把你的结果数组当作词典的字典来使用。外部字典包含所有唯一 ID 作为键,内部字典包含您想要在输出中的数据。在循环计算之后,我只是 return list(results.values())
来获得 N 个字典组合的列表。
代码如下:
class A:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}
class B:
def __init__(self):
pass
def run(self):
return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
results = {}
data_sources = [A(),B()]
for data in data_sources:
data_stream = data.run()
for data in data_stream.get('data'):
curr_id = data["id"]
result = results.setdefault(curr_id, {})
for key, value in data.items():
if key in ['html_url']:
result.setdefault(key, []).extend(value)
elif key in ['source']:
result.setdefault(key, []).append(value)
else:
result[key] = value
print(list(results.values()))