计算嵌套列表的语义描述符

Calculating the semantic descriptor of a nested list

我正在尝试计算嵌套列表的语义描述以将其转换为嵌套字典。首先我得到了distinct_words,它的每一个词都会成为我最终字典的关键字。

def build_semantic_descriptors(sentences):
    flat_list = [term for group in sentences for term in group]
    distinct_words = set(flat_list)

    d = {}
    for row in sentences:
        for words in row:
            if words not in d:
                d[words] = 1
            else:
                d[words] += 1 


if __name__ == '__main__':
         x = [["i", "am", "a", "sick", "man"],
              ["i", "am", "a", "spiteful", "man"],
              ["i", "am", "an", "unattractive", "man"],
              ["i", "believe", "my", "liver", "is", "diseased"],
              ["however", "i", "know", "nothing", "at", "all", "about", "my",
               "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
    print(build_semantic_descriptors(x))

预期输出:{'i': {'am': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1, 'believe': 1, 'my': 2, 'liver': 1, 'is': 1, 'diseased': 1, 'however': 1, 'know': 1, 'nothing': 1, 'at': 1, 'all': 1, 'about': 1, 'disease': 1, 'and': 1, 'do': 1, 'not': 1, 'for': 1, 'certain': 1, 'what': 1, 'ails': 1, 'me': 1}, 'am': {'i': 3, 'a': 2, 'sick': 1, 'man': 3, 'spiteful': 1, 'an': 1, 'unattractive': 1}, 等...}

此时这是我的代码。我已经得到了我想要的词作为键,但我不知道如何计算与它们相关的词并放入最终词典,我试过使用上面的计数器,但它所做的是计算整体价值出场次数。

在此先感谢您的帮助。

试试这个:

from collections import defaultdict
from itertools import product


def build_semantic_descriptors(sentences):
    d = defaultdict(lambda: defaultdict(int))

    for sentence in sentences:
        should_skip_key = True
        for (key, word) in product(sentence, sentence):
            if key == word and should_skip_key:
                should_skip_key = False
                continue
            d[key][word] += 1
    return d


if __name__ == '__main__':
    x = [["i", "am", "a", "sick", "man"],
          ["i", "am", "a", "spiteful", "man"],
          ["i", "am", "an", "unattractive", "man"],
          ["i", "believe", "my", "liver", "is", "diseased"],
          ["however", "i", "know", "nothing", "at", "all", "about", "my",
           "disease", "and", "do", "not", "know", "for", "certain", "what", "ails", "me"]]
    print(build_semantic_descriptors(x))

您需要将每个句子循环两次,以便为每个键获取每个单词。为此,您可以使用 itertools.product.

另请注意,我在这里使用了 collections.defaultdict,您应该了解一下,这是一个很好的实用程序,如果键不存在,它会将字典设置为默认值(允许跳过您进行的检查)