无法将在 Gensim 中创建的自定义训练词向量加载到 Spacy 中

Question

我训练了一个模型：

from gensim.models import Word2Vec    

model = Word2Vec(master_sent_list,
                     min_count=5,   
                     size=300,      
                     workers=5,    
                     window=5,      
                     iter=30)

根据post:

保存

model.wv.save_word2vec_format("../moj_word2vec.txt")
!gzip ../moj_word2vec.txt
!python -m spacy init-model en ../moj_word2vec.model --vectors-loc ../moj_word2vec.txt.gz

一切正常：

✔ Successfully created model
22470it [00:02, 8397.55it/s]j_word2vec.txt.gz
✔ Loaded vectors from ../moj_word2vec.txt.gz
✔ Sucessfully compiled vocab
22835 entries, 22470 vectors

然后我以不同的名称加载模型:

nlp = spacy.load('../moj_word2vec.model/')

但是出了点问题，因为我无法在 nlp 上使用常用命令；我可以 model.

例如，这些工作：

model.wv.most_similar('police')
model.vector_size

但这些不是：

nlp.wv.most_similar('police')
AttributeError: 'English' object has no attribute 'wv'

nlp.most_similar('police')
AttributeError: 'English' object has no attribute 'most_similar'

nlp.vector_size
AttributeError: 'English' object has no attribute 'vector_size'

加载或保存似乎有问题，有人可以帮忙吗？

Answer 1

没有什么坏的 - 你只是有错误的期望。

加载到您的 nlp 变量中的来自 spacy 的模型将不支持来自 gensim 模型 classes 的方法。

它是一个不同的库、代码、classes 和 API – 它本身并不使用 gensim 代码 – 即使它可以导入来自普通 word2vec_format 的普通向量集。

（例如，比较 type(model) 或 type(model.wv) 在您的工作 gensim 模型上的结果，然后是 spacy 对象的 type(nlp)稍后创建：完全不同的类型，具有不同的methods/properties。）

您必须使用以下组合：

正在检查 spacy 文档中的等效操作
如果您需要 gensim 操作，请将向量加载到 gensim 模型中 class。例如：

from gensim.models.keyedvectors import KeyedVectors
wv = KeyedVectors.load_word2vec_format(filename)
# then do gensim ops on the `wv` object

（您还可以使用 .save() 方法保存整个 gensim Word2Vec 模型，这将使用 Python 酸洗将其存储在一个或多个文件中。然后可以使用 Word2Vec.load() 将其重新加载到 gensim Word2Vec 模型中——尽管如果您只需要按词键查看单个词向量，则不需要完整模型。）

无法将在 Gensim 中创建的自定义训练词向量加载到 Spacy 中

Having trouble loading custom trained word vectors created in Gensim, into Spacy

python-3.x

gensim

spacy