计算嵌套列表的语义描述符
Calculating the semantic descriptor of a nested list
我正在尝试计算嵌套列表的语义描述以将其转换为嵌套字典。首先我得到了distinct_words
,它的每一个词都会成为我最终字典的关键字。
def build_semantic_descriptors(sentences):
flat_list = [term for group in sentences for term in group]
distinct_words = set(flat_list)
d = {}
for row in sentences:
for words in row:
if words not in d:
d[words] = 1
else:
d[words] += 1
if __name__ == '__main__':
x = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
print(build_semantic_descriptors(x))
预期输出:{'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1},
等...}
此时这是我的代码。我已经得到了我想要的词作为键,但我不知道如何计算与它们相关的词并放入最终词典,我试过使用上面的计数器,但它所做的是计算整体价值出场次数。
在此先感谢您的帮助。
试试这个:
from collections import defaultdict
from itertools import product
def build_semantic_descriptors(sentences):
d = defaultdict(lambda: defaultdict(int))
for sentence in sentences:
should_skip_key = True
for (key, word) in product(sentence, sentence):
if key == word and should_skip_key:
should_skip_key = False
continue
d[key][word] += 1
return d
if __name__ == '__main__':
x = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
print(build_semantic_descriptors(x))
您需要将每个句子循环两次,以便为每个键获取每个单词。为此,您可以使用 itertools.product
.
另请注意,我在这里使用了 collections.defaultdict
,您应该了解一下,这是一个很好的实用程序,如果键不存在,它会将字典设置为默认值(允许跳过您进行的检查)
我正在尝试计算嵌套列表的语义描述以将其转换为嵌套字典。首先我得到了distinct_words
,它的每一个词都会成为我最终字典的关键字。
def build_semantic_descriptors(sentences):
flat_list = [term for group in sentences for term in group]
distinct_words = set(flat_list)
d = {}
for row in sentences:
for words in row:
if words not in d:
d[words] = 1
else:
d[words] += 1
if __name__ == '__main__':
x = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
print(build_semantic_descriptors(x))
预期输出:{'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1},
等...}
此时这是我的代码。我已经得到了我想要的词作为键,但我不知道如何计算与它们相关的词并放入最终词典,我试过使用上面的计数器,但它所做的是计算整体价值出场次数。
在此先感谢您的帮助。
试试这个:
from collections import defaultdict
from itertools import product
def build_semantic_descriptors(sentences):
d = defaultdict(lambda: defaultdict(int))
for sentence in sentences:
should_skip_key = True
for (key, word) in product(sentence, sentence):
if key == word and should_skip_key:
should_skip_key = False
continue
d[key][word] += 1
return d
if __name__ == '__main__':
x = [["i", "am", "a", "sick", "man"],
["i", "am", "a", "spiteful", "man"],
["i", "am", "an", "unattractive", "man"],
["i", "believe", "my", "liver", "is", "diseased"],
["however", "i", "know", "nothing", "at", "all", "about", "my",
"disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
print(build_semantic_descriptors(x))
您需要将每个句子循环两次,以便为每个键获取每个单词。为此,您可以使用 itertools.product
.
另请注意,我在这里使用了 collections.defaultdict
,您应该了解一下,这是一个很好的实用程序,如果键不存在,它会将字典设置为默认值(允许跳过您进行的检查)