如何在确保向前兼容性的同时保存 Gensim 模型?
How do I save a Gensim model while ensuring forwards compatibility?
我在 Gensim 短语 class 上使用 save method 来存储模型以供将来使用,但是如果我更新我的 Gensim 版本,我在重新加载该模型时遇到问题。对于例如,在 Gensim 2.3.0 中加载模型时出现以下错误,该模型是在 2.2.0 中创建的:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<timed exec> in <module>()
~/Stuff/Sources/anaconda3/envs/nlp/lib/python3.6/site-packages/gensim/models/phrases.py in __init__(self, phrases_model)
395 self.min_count = phrases_model.min_count
396 self.delimiter = phrases_model.delimiter
--> 397 self.scoring = phrases_model.scoring
398 self.phrasegrams = {}
399 corpus = pseudocorpus(phrases_model.vocab, phrases_model.delimiter)
AttributeError: 'Phrases' object has no attribute 'scoring'
是否有更好的方式保证向前兼容?
我只用过 gensim
几次,是新手,但从 Change Log, the scoring
attribute was introduced on a Phrases
class in 2.3.0 判断。
现在,从我在 github 问题中注意到的情况来看,在保存和加载模型时,维护者正在努力保持向后兼容性。看起来 "missing scoring" 属性问题是 addressed in 3.1.0 - see the "backwards scoring compatibility when loading a Phrases class" comment and the related discussion in the pull request. The idea of the fix was basically to improve the load()
method to handle missing attributes and implicitly replacing them with the defaults 以避免加载失败。
我认为在 2.3.0 gensim
中有这个 generic SaveLoad
class for pickling/unpickling models - 如您所见,它非常简单,这里没有特定于模型的逻辑。
我不确定是否以及如何使模型在 2.2.0 和 2.3.0 之间保持兼容。我会在 gensim
issue tracker.
打开一个新问题
我在 Gensim 短语 class 上使用 save method 来存储模型以供将来使用,但是如果我更新我的 Gensim 版本,我在重新加载该模型时遇到问题。对于例如,在 Gensim 2.3.0 中加载模型时出现以下错误,该模型是在 2.2.0 中创建的:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<timed exec> in <module>()
~/Stuff/Sources/anaconda3/envs/nlp/lib/python3.6/site-packages/gensim/models/phrases.py in __init__(self, phrases_model)
395 self.min_count = phrases_model.min_count
396 self.delimiter = phrases_model.delimiter
--> 397 self.scoring = phrases_model.scoring
398 self.phrasegrams = {}
399 corpus = pseudocorpus(phrases_model.vocab, phrases_model.delimiter)
AttributeError: 'Phrases' object has no attribute 'scoring'
是否有更好的方式保证向前兼容?
我只用过 gensim
几次,是新手,但从 Change Log, the scoring
attribute was introduced on a Phrases
class in 2.3.0 判断。
现在,从我在 github 问题中注意到的情况来看,在保存和加载模型时,维护者正在努力保持向后兼容性。看起来 "missing scoring" 属性问题是 addressed in 3.1.0 - see the "backwards scoring compatibility when loading a Phrases class" comment and the related discussion in the pull request. The idea of the fix was basically to improve the load()
method to handle missing attributes and implicitly replacing them with the defaults 以避免加载失败。
我认为在 2.3.0 gensim
中有这个 generic SaveLoad
class for pickling/unpickling models - 如您所见,它非常简单,这里没有特定于模型的逻辑。
我不确定是否以及如何使模型在 2.2.0 和 2.3.0 之间保持兼容。我会在 gensim
issue tracker.