如何在确保向前兼容性的同时保存 Gensim 模型？

Question

我在 Gensim 短语 class 上使用 save method 来存储模型以供将来使用，但是如果我更新我的 Gensim 版本，我在重新加载该模型时遇到问题。对于例如，在 Gensim 2.3.0 中加载模型时出现以下错误，该模型是在 2.2.0 中创建的：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>()

~/Stuff/Sources/anaconda3/envs/nlp/lib/python3.6/site-packages/gensim/models/phrases.py in __init__(self, phrases_model)
    395         self.min_count = phrases_model.min_count
    396         self.delimiter = phrases_model.delimiter
--> 397         self.scoring = phrases_model.scoring
    398         self.phrasegrams = {}
    399         corpus = pseudocorpus(phrases_model.vocab, phrases_model.delimiter)

AttributeError: 'Phrases' object has no attribute 'scoring'

是否有更好的方式保证向前兼容？

Answer 1

我只用过 gensim 几次，是新手，但从 Change Log, the scoring attribute was introduced on a Phrases class in 2.3.0 判断。

现在，从我在 github 问题中注意到的情况来看，在保存和加载模型时，维护者正在努力保持向后兼容性。看起来 "missing scoring" 属性问题是 addressed in 3.1.0 - see the "backwards scoring compatibility when loading a Phrases class" comment and the related discussion in the pull request. The idea of the fix was basically to improve the load() method to handle missing attributes and implicitly replacing them with the defaults 以避免加载失败。

我认为在 2.3.0 gensim 中有这个 generic SaveLoad class for pickling/unpickling models - 如您所见，它非常简单，这里没有特定于模型的逻辑。

我不确定是否以及如何使模型在 2.2.0 和 2.3.0 之间保持兼容。我会在 gensim issue tracker.

打开一个新问题

如何在确保向前兼容性的同时保存 Gensim 模型？

How do I save a Gensim model while ensuring forwards compatibility?

python

nlp

machine-learning

gensim

data-science