为什么 pandas 在读取 pickled 文件时尝试导入模块?

Why does pandas attempt to import a module when reading from a pickled file?

我通过 Instagram API 收集了一些数据,并将其存储到 pandas DataFrame 中,后者又通过 pandas .to_pickle() 保存方法。

尝试使用“read_pickle()”方法在另一台计算机上加载 DataFrame 时,返回以下错误:

Traceback (most recent call last):
File "examine.py", line 14, in <module>
dataframe = pd.read_pickle(args["dataframe"])
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/io/pickle.py", line 65, in read_pickle
return try_read(path)
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/io/pickle.py", line 62, in try_read
return pc.load(fh, encoding=encoding, compat=True)
File "/home/user/virtualenvs/geopandas/local/lib/python2.7/site-packages/pandas/compat/pickle_compat.py", line 117, in load
return up.load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1090, in load_global
klass = self.find_class(module, name)
File "/usr/lib/python2.7/pickle.py", line 1124, in find_class
__import__(module)
ImportError: No module named instagram.models

知道是什么原因造成的吗?

Pickle 根本不知道如何重新创建 classes。 class 如何被解封和恢复的信息存储在 class 中:__new____init____setstate__ 等等。

Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. If you plan to have long-lived objects that will see many versions of a class, it may be worthwhile to put a version number in the objects so that suitable conversions can be made by the class’s __setstate__() method.

来源:Python pickle: What can be pickled and unpickled?

所以要解开它,pickle 需要加载 class(以及任何中间模块)。

如果您没有 have/want instagram 模块,您应该检查如何将原始数据框中的适当值转换为正常的 classes (int , float, array, ...) 在腌制之前。