为 word2vec 模型从列表转换为字典

Question

我的文本文件中有大量数据，我想为 skip gram 模型进行训练。我已将文件中的数据拆分为列表现在我想计算单词的出现次数并制作字典，将单词作为字典的关键字和频率，因为 value.here 是我的代码片段

with open("enwik8","r") as data:
    words=data.read().split()   

vocabulary_size = 5000


  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
count.extend(collections.Counter(words).most_common(vocabulary_size - 1))

我已经成功地列出了最常见的 50000 个单词及其频率的列表，现在我需要将它们提供给字典，关键字作为单词，值作为频率。

dictionary = dict()
for word, _ in count:

谁能帮我度过难关？？

Answer 1

假设您已经有了一个单词列表，下面是您根据需要从中提取字典的方法：

word_dict = dict()
for word_count in words:
    if word_count[0] not in word_dict:
        word_dict[word_count[0]] = word_count[1]

你的列表包含元组，所以 word_dict[word_count[0]]，所以我将元组的第一项作为单词 key 放在字典中，第二项 word_count[1] 放在元组中，这是计数作为 value 到 key

为 word2vec 模型从列表转换为字典

converting from a list to dictionary for a word2vec model

python

dictionary

nlp

word2vec