单词字典的 word2vec

Question

我需要为单词词典生成 word2vec 数组。字典看起来像这样

test={0: 'tench, Tinca tinca',
 1: 'goldfish, Carassius auratus',
 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias',
 3: 'tiger shark, Galeocerdo cuvieri',
 4: 'hammerhead, hammerhead shark'}

循环应该遍历每一行，检查模型中是否存在该词，如果存在则将向量存储在数组中，否则检查该行中的下一个词。如果 none 个单词出现在 gensim 模型中，那么它什么都不做（数组用零初始化）但是，如果预训练模型中不存在某个单词，则会引发此异常：

KeyError: "word 'Galeocerdo cuvieri' not in vocabulary"

为了绕过引发的错误，也有异常的理想循环应该是什么？这是我的起始代码：

 import gensim
 model = gensim.models.KeyedVectors.load_word2vec_format('/home/shikhar /Downloads/GoogleNews-vectors-negative300.bin',binary=True) 
 array=np.zeros((4,300)) 
 for i in test:
     synonyms=test[i].split(',')

Answer 1

为什么不试试这个

vectors= list()
for i in test:
    flag=True
    synonyms=test[i].split(',')
    for k in synonyms:
        try:
            vectors.append(model[k]])
            flag = False
            break
        except KeyError as e:
            print(e)
            continue
    if flag:
        vectors.append(# Insert your array with zeroes here)

我假设您需要列表中的所有向量

单词字典的 word2vec

word2vec for dictionary of words

python

nlp

gensim

word2vec