如何迭代嵌套的字典(计数器)并递归更新键
How to iterate through nested dicts (counters) and update keys recursively
我正在将文件中的数据读取到一系列列表中,如下所示:
sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]
其中每个列表中的来源和主题可能相同或不同。
我想要实现的是一个字典,它将与每个来源相关的主题及其相关分数分组(然后分数将被平均,但为了这个问题的目的,我们只将它们列为主题(关键)。
理想情况下,结果看起来像嵌套字典列表,如下所示:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
我认为最好的方法是创建一个来源计数器,然后为每个来源的每个主题创建一个字典,并将每个字典保存为每个相应来源的值。但是,我无法正确迭代以获得所需的结果。
这是我目前的情况:
sourceDict = {}
sourceDictList = []
for row in sourceData:
source = row[0]
score = row[1]
topic = row[2]
sourceDict = [source,{topic:score}]
sourceDictList.append(sourceDict)
sourceList.append(source)
其中 sourceDictList
结果如下: [[source, {topic: score}]...],
(本质上是重新格式化原始列表列表中的数据),而 sourceList
只是所有源(一些重复)。
然后我初始化一个计数器并将来自计数器的源与来自 sourceDictList
的源匹配,如果它们匹配,将 topic:score
字典保存为键:
sourceCounter = Counter(sourceList)
for key,val in sourceCounter.items():
for dictitem in sourceDictList:
if dictitem[0] == key:
sourceCounter[key] = dictitem[1]
但输出只是将最后一个 topic:score
字典保存到每个源。因此,而不是所需的:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
我只得到:
Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})
我的印象是,如果有一个唯一的键保存到字典中,它会附加 key:value
对而不覆盖以前的键。我错过了什么吗?
感谢任何帮助。
我们可以做到:
sourceData = [
['source1', 'topic1', 'score1'],
['source1', 'topic2', 'score1'],
['source1', 'topic1', 'score2'],
['source2', 'topic1', 'score1'],
['source2', 'topic2', 'score2'],
['source2', 'topic1', 'score3'],
]
sourceDict = {}
for row in sourceData:
source = row[0]
topic = row[1]
score = row[2]
if source not in sourceDict:
# This will be executed when the source
# comes for the first time.
sourceDict[source] = {}
if topic not in sourceDict[source]:
# This will be executed when the topic
# inside that source comes for the first time.
sourceDict[source][topic] = []
sourceDict[source][topic].append(score)
print(sourceDict)
您可以简单地使用集合的 defaultdict
sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]
from collections import defaultdict
sourceDict = defaultdict(dict)
for source, topic, score in sourdata:
topicScoreDict = sourceDict[source]
topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]
>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}
我正在将文件中的数据读取到一系列列表中,如下所示:
sourceData = [[source, topic, score],[source, topic, score],[source, topic, score]...]
其中每个列表中的来源和主题可能相同或不同。
我想要实现的是一个字典,它将与每个来源相关的主题及其相关分数分组(然后分数将被平均,但为了这个问题的目的,我们只将它们列为主题(关键)。
理想情况下,结果看起来像嵌套字典列表,如下所示:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
我认为最好的方法是创建一个来源计数器,然后为每个来源的每个主题创建一个字典,并将每个字典保存为每个相应来源的值。但是,我无法正确迭代以获得所需的结果。
这是我目前的情况:
sourceDict = {}
sourceDictList = []
for row in sourceData:
source = row[0]
score = row[1]
topic = row[2]
sourceDict = [source,{topic:score}]
sourceDictList.append(sourceDict)
sourceList.append(source)
其中 sourceDictList
结果如下: [[source, {topic: score}]...],
(本质上是重新格式化原始列表列表中的数据),而 sourceList
只是所有源(一些重复)。
然后我初始化一个计数器并将来自计数器的源与来自 sourceDictList
的源匹配,如果它们匹配,将 topic:score
字典保存为键:
sourceCounter = Counter(sourceList)
for key,val in sourceCounter.items():
for dictitem in sourceDictList:
if dictitem[0] == key:
sourceCounter[key] = dictitem[1]
但输出只是将最后一个 topic:score
字典保存到每个源。因此,而不是所需的:
[{SOURCE1:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}},
{SOURCE2:{TOPIC_A:SCORE1,SCORE2,SCORE3},
{TOPIC_B:SCORE1,SCORE2,SCORE3},
{TOPIC_C:SCORE1,SCORE2,SCORE3}}...]
我只得到:
Counter({SOURCE1: {TOPIC_n: 'SCORE_n'}, SOURCE2: {TOPIC_n: 'SCORE_n'}, SOURCE3: {TOPIC_n: 'SCORE_n'}})
我的印象是,如果有一个唯一的键保存到字典中,它会附加 key:value
对而不覆盖以前的键。我错过了什么吗?
感谢任何帮助。
我们可以做到:
sourceData = [
['source1', 'topic1', 'score1'],
['source1', 'topic2', 'score1'],
['source1', 'topic1', 'score2'],
['source2', 'topic1', 'score1'],
['source2', 'topic2', 'score2'],
['source2', 'topic1', 'score3'],
]
sourceDict = {}
for row in sourceData:
source = row[0]
topic = row[1]
score = row[2]
if source not in sourceDict:
# This will be executed when the source
# comes for the first time.
sourceDict[source] = {}
if topic not in sourceDict[source]:
# This will be executed when the topic
# inside that source comes for the first time.
sourceDict[source][topic] = []
sourceDict[source][topic].append(score)
print(sourceDict)
您可以简单地使用集合的 defaultdict
sourdata = [['source', 'topic', 2],['source', 'topic', 3], ['source', 'topic2', 3],['source2', 'topic', 4]]
from collections import defaultdict
sourceDict = defaultdict(dict)
for source, topic, score in sourdata:
topicScoreDict = sourceDict[source]
topicScoreDict[topic] = topicScoreDict.get(topic, []) + [score]
>>> print(sourceDict)
>>> defaultdict(<class 'dict'>, {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}})
>>> print(dict(sourceDict))
>>> {'source': {'topic': [2, 3], 'topic2': [3]}, 'source2': {'topic': [4]}}