加载 word2vec (gensim) 时如何修复 unpickling 键错误?
How to fix unpickling key error when loading word2vec (gensim)?
我正在尝试加载取自 here
的 pkl 格式的预训练 word2vec 模型
我用来加载它的代码行:
model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl')
但是,我不断收到以下错误(完整回溯):
UnpicklingError Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
55
56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
58
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
1551 @classmethod
1552 def load(cls, fname_or_handle, **kwargs):
-> 1553 model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
1554 if isinstance(model, FastTextKeyedVectors):
1555 if not hasattr(model, 'compatible_hash'):
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
226 @classmethod
227 def load(cls, fname_or_handle, **kwargs):
--> 228 return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
229
230 def similarity(self, entity1, entity2):
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls, fname, mmap)
433 compress, subname = SaveLoad._adapt_by_suffix(fname)
434
--> 435 obj = unpickle(fname)
436 obj._load_specials(fname, mmap, compress, subname)
437 logger.info("loaded %s", fname)
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
1396 # Because of loading from S3 load can't be used (missing readline in smart_open)
1397 if sys.version_info > (3, 0):
-> 1398 return _pickle.load(f, encoding='latin1')
1399 else:
1400 return _pickle.loads(f.read())
UnpicklingError: invalid load key, ':'.
我尝试用 load_word2vec_format 加载它,但没有成功。有什么想法吗?
根据您的 link https://wikipedia2vec.github.io/wikipedia2vec/pretrained/ 这些将使用该库的 Wikipedia2Vec.load()
方法加载。
Gensim 的 .load()
方法只能用于直接从 Gensim 模型对象保存的文件。
Wikipedia2Vec 项目确实说他们的 .txt
文件格式将加载 .load_word2vec_format()
,因此您也可以尝试 - 但使用他们的 .txt
格式文件之一。
他们的完整模型 .pkl
文件只能使用他们 class 自己的加载功能。
我正在尝试加载取自 here
的 pkl 格式的预训练 word2vec 模型我用来加载它的代码行:
model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl')
但是,我不断收到以下错误(完整回溯):
UnpicklingError Traceback (most recent call last)
<ipython-input-15-ebd5780b6636> in <module>
55
56 #Load pretrained word2vec
---> 57 model = gensim.models.KeyedVectors.load('enwiki_20180420_500d.pkl',mmap='r')
58
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
1551 @classmethod
1552 def load(cls, fname_or_handle, **kwargs):
-> 1553 model = super(WordEmbeddingsKeyedVectors, cls).load(fname_or_handle, **kwargs)
1554 if isinstance(model, FastTextKeyedVectors):
1555 if not hasattr(model, 'compatible_hash'):
~/anaconda3/lib/python3.7/site-packages/gensim/models/keyedvectors.py in load(cls, fname_or_handle, **kwargs)
226 @classmethod
227 def load(cls, fname_or_handle, **kwargs):
--> 228 return super(BaseKeyedVectors, cls).load(fname_or_handle, **kwargs)
229
230 def similarity(self, entity1, entity2):
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in load(cls, fname, mmap)
433 compress, subname = SaveLoad._adapt_by_suffix(fname)
434
--> 435 obj = unpickle(fname)
436 obj._load_specials(fname, mmap, compress, subname)
437 logger.info("loaded %s", fname)
~/anaconda3/lib/python3.7/site-packages/gensim/utils.py in unpickle(fname)
1396 # Because of loading from S3 load can't be used (missing readline in smart_open)
1397 if sys.version_info > (3, 0):
-> 1398 return _pickle.load(f, encoding='latin1')
1399 else:
1400 return _pickle.loads(f.read())
UnpicklingError: invalid load key, ':'.
我尝试用 load_word2vec_format 加载它,但没有成功。有什么想法吗?
根据您的 link https://wikipedia2vec.github.io/wikipedia2vec/pretrained/ 这些将使用该库的 Wikipedia2Vec.load()
方法加载。
Gensim 的 .load()
方法只能用于直接从 Gensim 模型对象保存的文件。
Wikipedia2Vec 项目确实说他们的 .txt
文件格式将加载 .load_word2vec_format()
,因此您也可以尝试 - 但使用他们的 .txt
格式文件之一。
他们的完整模型 .pkl
文件只能使用他们 class 自己的加载功能。