无法加载在 Gensim 中训练的模型 - 泡菜相关错误
Unable to Load Model Trained in Gensim- pickle-related error
当试图在 Windows 机器上加载由 Gensim 训练的 word2vec 模型时,我收到以下错误:
AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>
我过去曾在此系统上使用 Gensim 成功训练过许多模型。唯一的变化是这次我将 model.build_vocab()
和 model.train()
阶段分开,为每个时期添加保存和时间黑客。我还为词汇构建和训练短语使用了不同的迭代器,但在具有相同标记化管道的相同数据集上。
以下是我如何完成纪元进度 tracking/saving:
class EpochProgress(CallbackAny2Vec):
'''saves the model after each epoch'''
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.epoch = 0
self.start_time = time.time()
def on_epoch_begin(self, model):
print("epoch #{} started".format(self.epoch))
def on_epoch_end(self, model):
print("epoch #{} completed".format(self.epoch))
passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
print("{} hours have passed".format(str(passed)))
output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
model.save(output_path)
print("model saved at: {}".format(output_path))
self.epoch +=1
epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')
然后我用词汇构建加载基线模型并设置一些参数:
model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers
model.callbacks = [epoch_progress]
然后我这样训练:
model.train(corpus, total_examples=model.corpus_count, epochs=epochs)
最后,像这样保存最终产品:
model.save('E:/w2v_models/trained/{}'.format(new_model_filename))
训练似乎工作正常,模型按预期保存 - 不幸的是现在我无法加载它。
这是完整的调试读数:
> AttributeError Traceback (most recent call
> last)
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1329 try:
> -> 1330 model = super(Word2Vec, cls).load(*args, **kwargs) 1331
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, *args, **kwargs) 1243 """
> -> 1244 model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs) 1245 if not hasattr(model,
> 'ns_exponent'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
> 602 """
> --> 603 return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
> 604
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> load(cls, fname, mmap)
> 425
> --> 426 obj = unpickle(fname)
> 427 obj._load_specials(fname, mmap, compress, subname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> unpickle(fname) 1383 if sys.version_info > (3, 0):
> -> 1384 return _pickle.load(f, encoding='latin1') 1385 else:
>
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
>
> During handling of the above exception, another exception occurred:
>
> AttributeError Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
> 3
> 4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1339 logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.') 1340 from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341 return load_old_word2vec(*args, **kwargs) 1342 1343
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load_old_word2vec(*args, **kwargs)
> 170
> 171 def load_old_word2vec(*args, **kwargs):
> --> 172 old_model = Word2Vec.load(*args, **kwargs)
> 173 vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
> 174 params = {
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load(cls, *args, **kwargs) 1639 @classmethod 1640 def
> load(cls, *args, **kwargs):
> -> 1641 model = super(Word2Vec, cls).load(*args, **kwargs) 1642 # update older models 1643 if hasattr(model,
> 'table'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in load(cls, fname, mmap)
> 85 compress, subname = SaveLoad._adapt_by_suffix(fname)
> 86
> ---> 87 obj = unpickle(fname)
> 88 obj._load_specials(fname, mmap, compress, subname)
> 89 logger.info("loaded %s", fname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in unpickle(fname)
> 377 b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
> 378 if sys.version_info > (3, 0):
> --> 379 return _pickle.loads(file_bytes, encoding='latin1')
> 380 else:
> 381 return _pickle.loads(file_bytes)
>
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'\>
Python pickling/unpickling 可以 运行 在保存代码块时出现问题,或者 classes/instances-of-classes 您在保存之前定义的,但在加载时可能不可用。 (特别是未从显式路径导入的匿名或全局范围类型。)
这是 gensim 模型保存的一个已知问题,未来的版本可能会完全避免在模型中存储此类回调代码。 (相反,您必须在每次使用它们执行方法时指定回调,并且它们只会对该调用有效。)
有关详细信息,请参阅 gensim project issue #2136,包括似乎已帮助其他人重新加载其模型的解决方法:确保 EpochProgress
class 与 defined/imported 相同负载被尝试。
当试图在 Windows 机器上加载由 Gensim 训练的 word2vec 模型时,我收到以下错误:
AttributeError: Can't get attribute 'EpochProgress' on <module '__main__'>
我过去曾在此系统上使用 Gensim 成功训练过许多模型。唯一的变化是这次我将 model.build_vocab()
和 model.train()
阶段分开,为每个时期添加保存和时间黑客。我还为词汇构建和训练短语使用了不同的迭代器,但在具有相同标记化管道的相同数据集上。
以下是我如何完成纪元进度 tracking/saving:
class EpochProgress(CallbackAny2Vec):
'''saves the model after each epoch'''
def __init__(self, path_prefix):
self.path_prefix = path_prefix
self.epoch = 0
self.start_time = time.time()
def on_epoch_begin(self, model):
print("epoch #{} started".format(self.epoch))
def on_epoch_end(self, model):
print("epoch #{} completed".format(self.epoch))
passed = (time.time() - self.start_time)/60/60 # elapsed time since start in HOURS
print("{} hours have passed".format(str(passed)))
output_path = get_tmpfile('{}_epoch{}.model'.format(self.path_prefix, self.epoch))
model.save(output_path)
print("model saved at: {}".format(output_path))
self.epoch +=1
epoch_progress = EpochProgress('E:/jade_prism/embeddings/phrase-embed-over- time/mega_WOS_word2vec/w2v_models/in_progress/')
然后我用词汇构建加载基线模型并设置一些参数:
model = gensim.models.Word2Vec.load(baseline_models_directory+chosen_name)
model.window = window
model.size = size
model.workers = workers
model.callbacks = [epoch_progress]
然后我这样训练:
model.train(corpus, total_examples=model.corpus_count, epochs=epochs)
最后,像这样保存最终产品:
model.save('E:/w2v_models/trained/{}'.format(new_model_filename))
训练似乎工作正常,模型按预期保存 - 不幸的是现在我无法加载它。
这是完整的调试读数:
> AttributeError Traceback (most recent call
> last)
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1329 try:
> -> 1330 model = super(Word2Vec, cls).load(*args, **kwargs) 1331
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, *args, **kwargs) 1243 """
> -> 1244 model = super(BaseWordEmbeddingsModel, cls).load(*args, **kwargs) 1245 if not hasattr(model,
> 'ns_exponent'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\base_any2vec.py
> in load(cls, fname_or_handle, **kwargs)
> 602 """
> --> 603 return super(BaseAny2VecModel, cls).load(fname_or_handle, **kwargs)
> 604
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> load(cls, fname, mmap)
> 425
> --> 426 obj = unpickle(fname)
> 427 obj._load_specials(fname, mmap, compress, subname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\utils.py in
> unpickle(fname) 1383 if sys.version_info > (3, 0):
> -> 1384 return _pickle.load(f, encoding='latin1') 1385 else:
>
> AttributeError: Can't get attribute 'EpochProgress' on <module
> '__main__'>
>
> During handling of the above exception, another exception occurred:
>
> AttributeError Traceback (most recent call
> last) <ipython-input-4-0206f9f8f3ad> in <module>
> 3
> 4 # Load the model based onthe model name
> ----> 5 model = gensim.models.Word2Vec.load(model_name)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\word2vec.py
> in load(cls, *args, **kwargs) 1339 logger.info('Model
> saved using code from earlier Gensim Version. Re-loading old model in
> a compatible way.') 1340 from
> gensim.models.deprecated.word2vec import load_old_word2vec
> -> 1341 return load_old_word2vec(*args, **kwargs) 1342 1343
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load_old_word2vec(*args, **kwargs)
> 170
> 171 def load_old_word2vec(*args, **kwargs):
> --> 172 old_model = Word2Vec.load(*args, **kwargs)
> 173 vector_size = getattr(old_model, 'vector_size', old_model.layer1_size)
> 174 params = {
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\word2vec.py
> in load(cls, *args, **kwargs) 1639 @classmethod 1640 def
> load(cls, *args, **kwargs):
> -> 1641 model = super(Word2Vec, cls).load(*args, **kwargs) 1642 # update older models 1643 if hasattr(model,
> 'table'):
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in load(cls, fname, mmap)
> 85 compress, subname = SaveLoad._adapt_by_suffix(fname)
> 86
> ---> 87 obj = unpickle(fname)
> 88 obj._load_specials(fname, mmap, compress, subname)
> 89 logger.info("loaded %s", fname)
>
> C:\anaconda\envs\mega_WOS\lib\site-packages\gensim\models\deprecated\old_saveload.py
> in unpickle(fname)
> 377 b'gensim.models.wrappers.fasttext', b'gensim.models.deprecated.fasttext_wrapper')
> 378 if sys.version_info > (3, 0):
> --> 379 return _pickle.loads(file_bytes, encoding='latin1')
> 380 else:
> 381 return _pickle.loads(file_bytes)
>
> AttributeError: Can't get attribute 'EpochProgress' on module '__main__'\>
Python pickling/unpickling 可以 运行 在保存代码块时出现问题,或者 classes/instances-of-classes 您在保存之前定义的,但在加载时可能不可用。 (特别是未从显式路径导入的匿名或全局范围类型。)
这是 gensim 模型保存的一个已知问题,未来的版本可能会完全避免在模型中存储此类回调代码。 (相反,您必须在每次使用它们执行方法时指定回调,并且它们只会对该调用有效。)
有关详细信息,请参阅 gensim project issue #2136,包括似乎已帮助其他人重新加载其模型的解决方法:确保 EpochProgress
class 与 defined/imported 相同负载被尝试。