Word2vec

Question

鉴于我有一个 word2vec 模型（通过 gensim），我想获得单词之间的排名相似度。例如，假设我有 "desk" 这个词，与 "desk" 最相似的词是：

table 0.64

chair 0.61

book 0.59

pencil 0.52

我想创建一个函数：

f(desk,book) = 3 Since book is the 3rd most similar word to desk. Does it exists? what is the most efficient way to do this?

Answer 1

您可以使用 rank(entity1, entity2) 获取距离 - 与索引相同。

model.wv.rank(sample_word, most_similar_word)

此处不需要下面给出的单独函数。保留它以供参考。

假设您在元组列表中有单词列表及其向量，由 model.wv.most_similar(sample_word) 返回，如图所示

[('table', 0.64), ('chair', 0.61), ('book', 0.59), ('pencil', 0.52)]

以下函数接受样本词和最相似的词作为参数，returns 索引或排名（例如 [2]）如果它出现在输出中

def rank_of_most_similar_word(sample_word, most_similar_word):
    l = model.wv.most_similar(sample_word)
    return [x+1 for x, y in enumerate(l) if y[0] == most_similar_word]

sample_word = 'desk'
most_similar_word = 'book'
rank_of_most_similar_word(sample_word, most_similar_word)

注意：在使用 model.wv.most_similar() 的同时，使用 topn=x 获取前 x 个最相似的词，如评论中所建议。

Word2vec - 获得相似度等级

Word2vec - get rank of similarity

python

nlp

python-3.x

gensim