Gensim `most_similar` 中的弃用警告？

Question

在 Python 3.7 中实现 Word2Vec 时，我遇到了与折旧相关的意外情况。我的问题是 word2vec gensim python 中关于 'most_similar' 的折旧警告到底是什么？

目前，我遇到了以下问题。

DeprecationWarning：调用已弃用的 most_similar（方法将在 4.0.0 中删除，请改用 self.wv.most_similar()）。 model.most_similar('hamlet') FutureWarning：不推荐将 issubdtype 的第二个参数从 int 转换为 np.signedinteger。将来，它将被视为np.int32 == np.dtype(int).type。如果 np.issubdtype(vec.dtype, np.int):

请帮忙解决这个问题？感谢任何帮助。

我试过的代码如下

import re
from gensim.models import Word2Vec
from nltk.corpus import gutenberg

sentences = list(gutenberg.sents('shakespeare-hamlet.txt'))   
print('Type of corpus: ', type(sentences))
print('Length of corpus: ', len(sentences))

for i in range(len(sentences)):
    sentences[i] = [word.lower() for word in sentences[i] if re.match('^[a-zA-Z]+', word)]
print(sentences[0])    # title, author, and year
print(sentences[1])
print(sentences[10])
model = Word2Vec(sentences=sentences, size = 100, sg = 1, window = 3, min_count = 1, iter = 10, workers = 4)
model.init_sims(replace = True)
model.save('word2vec_model')
model = Word2Vec.load('word2vec_model')
model.most_similar('hamlet')

Answer 1

弃用警告是一种警告，表明使用的东西在 Python 的未来版本中可能存在也可能不存在，通常会被其他东西取代。（说出它们是什么）

错误似乎源自 Word2Vec 内部，而不是您的代码。消除这些错误需要进入该库并更改其代码。

尝试按照它告诉你的去做。

将您的 model.most_similar('hamlet') 更改为 model.wv.most_similar('hamlet')

我不熟悉此软件包，因此请根据您的使用情况调整它的工作方式。

Answer 2

所以，Gensim 在这里告诉你，最终你将无法直接在 Word2Vec 模型上使用 most_similar 方法。相反，您需要在 model.wv 对象上调用它，这些对象是训练模型时存储的键控向量。

Answer 3

这是一个警告，它即将过时且无法正常工作。

Usually things are deprecated for a few versions giving anyone using them enough time to move to the new method before they are removed.

他们已将 most_similar 移动到 wv

所以 most_simliar() 应该类似于：

model.wv.most_similar('hamlet')

src ref

希望这对您有所帮助

编辑：使用 wv.most_similar()

import re
from gensim.models import Word2Vec
from nltk.corpus import gutenberg

sentences = list(gutenberg.sents('shakespeare-hamlet.txt'))   
print('Type of corpus: ', type(sentences))
print('Length of corpus: ', len(sentences))

for i in range(len(sentences)):
    sentences[i] = [word.lower() for word in sentences[i] if re.match('^[a-zA-Z]+', word)]
print(sentences[0])    # title, author, and year
print(sentences[1])
print(sentences[10])
model = Word2Vec(sentences=sentences, size = 100, sg = 1, window = 3, min_count = 1, iter = 10, workers = 4)
model.init_sims(replace = True)
model.save('word2vec_model')
model = Word2Vec.load('word2vec_model')
similarities = model.wv.most_similar('hamlet')
for word , score in similarities:
    print(word , score)

Answer 4

更新到4.0.0版本后，函数model.most_similar()将被移除。所以你可以做的是将函数修改为model.wv.most_similar()。函数 model.similarity() 也是如此。您必须将其更改为 model.wv.similarity()。

Gensim `most_similar` 中的弃用警告？

DeprecationWarning in Gensim `most_similar`?

python

python-3.x

gensim

word2vec