加载 word2vec (gensim) 时如何修复 unpickling 键错误?

How to fix unpickling key error when loading word2vec (gensim)?

我正在尝试加载取自 here

的 pkl 格式的预训练 word2vec 模型

我用来加载它的代码行:

model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl') 

但是,我不断收到以下错误(完整回溯):

UnpicklingError                           Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
     55 
     56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
     58 

~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
   1551     @classmethod
   1552     def load(cls, fname_or_handle, **kwargs):
-> 1553         model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
   1554         if isinstance(model, FastTextKeyedVectors):
   1555             if not hasattr(model, 'compatible_hash'):

~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
    226     @classmethod
    227     def load(cls, fname_or_handle, **kwargs):
--> 228         return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
    229 
    230     def similarity(self, entity1, entity2):

~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls, fname, mmap)
    433         compress, subname = SaveLoad._adapt_by_suffix(fname)
    434 
--> 435         obj = unpickle(fname)
    436         obj._load_specials(fname, mmap, compress, subname)
    437         logger.info("loaded %s", fname)

~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
   1396         # Because of loading from S3 load can't be used (missing readline in smart_open)
   1397         if sys.version_info > (3, 0):
-> 1398             return _pickle.load(f, encoding='latin1')
   1399         else:
   1400             return _pickle.loads(f.read())

UnpicklingError: invalid load key, ':'.

我尝试用 load_word2vec_format 加载它,但没有成功。有什么想法吗?

根据您的 link https://wikipedia2vec.github.io/wikipedia2vec/pretrained/ 这些将使用该库的 Wikipedia2Vec.load() 方法加载。

Gensim 的 .load() 方法只能用于直接从 Gensim 模型对象保存的文件。

Wikipedia2Vec 项目确实说他们的 .txt 文件格式将加载 .load_word2vec_format(),因此您也可以尝试 - 但使用他们的 .txt 格式文件之一。

他们的完整模型 .pkl 文件只能使用他们 class 自己的加载功能。