Word2Vec 词汇未定义错误

Question

我是 python 和 word2vec 的新手，一直收到 "you must first build vocabulary before training the model" 错误。我的代码有什么问题？

这是我的代码：

file_object=open("SupremeCourt.txt","w")
from gensim.models import word2vec

data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)

out=model.most_similar()

print(out[1])
print(out[2])

Answer 1

您将使用此行以写入模式打开该文件：

file_object = open("SupremeCourt.txt", "w")

通过这样做，您将擦除文件的内容，因此当您尝试将文件传递给模型进行训练时，没有数据可读。这就是抛出该错误的原因。

删除该行（并恢复您的文件内容），它会起作用。

Answer 2

我可以在您的代码中看到一些错误的东西，例如文件以写入模式打开，并且您加载的模型不包含您想要查找最相似单词的单词。我想建议使用像 google_news_vectors to load in the gensim or to build your own word2vec model 这样的预定义模型，这样你就不会得到错误。 most_similar在gensim中的用法是out = model.most_similar("word-name")

file_object=open("SupremeCourt.txt","r")
from gensim.models import word2vec

data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)#use google news vectors here 

out=model.most_similar("word")
print(out)

Word2Vec 词汇未定义错误

Word2Vec Vocabulary not definded error

python

word2vec