为什么在 Python 对象被 pickle 并重新加载后某些属性会丢失?
Why some attributes become missing after a Python object is pickled and reloaded?
我遇到了一个问题,即在将实例转储到 pickle 文件并将其加载回来后实例的某些属性丢失。谁能帮忙解释一下?谢谢!
这是一个具体的例子:
File/directory hierachy:
-test
-test_module
-__init__.py
-myDataFrameMapper.py
-mySklearn.py
-main.py
__init__.py:
from .mySklearn import mySklearn
mySklearn.py
import sklearn_pandas as sk_pd
from .myDataFrameMapper import myDataFrameMapper
class mySklearn:
def initialize():
sk_pd.DataFrameMapper.myTransform = myDataFrameMapper.transform()
myDataFrameMapper.py
import numpy as np
from sklearn_pandas import DataFrameMapper
class myDataFrameMapper:
def transform():
def closure(self, df, **kwargs):
self.addedKey = 'addedValue' # a new attribute is added here
return closure
main.py
import pandas as pd
import pickle
import random
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler, LabelEncoder
from test_module import mySklearn
mySklearn.initialize()
data = {'pet':["cat", "dog", "dog", "fish", "cat", "dog", "cat", "fish"],
'children':[4., 6, 3, 3, 2, 3, 5, 4],
'salary':[90, 24, 44, 27, 32, 59, 36, 27]}
df = pd.DataFrame(data)
column_tuples = [
('pet', LabelEncoder()),
('children', LabelEncoder()),
('salary', LabelEncoder())
]
mapper = DataFrameMapper(column_tuples, input_df=True)
mapper.fit(data)
print('original attributes in mapper:')
print(mapper.__dict__)
mapper.myTransform(df.iloc[[1]])
print('\nafter adding a new attributes \'addedKey\':')
print(mapper.__dict__)
print('\ndump the mapper into a pickle file...')
picklefile = open('mapper.pkl', 'wb')
pickle.dump(mapper, picklefile)
picklefile.close()
print('\nload the mapper from the pickle file...')
picklefile = open('mapper.pkl', 'rb')
mapper1 = pickle.load(picklefile)
picklefile.close()
print('\nafter being loaded, the attributes in the mapper are:')
print(mapper1.__dict__)
在 运行 python3 main.py
之后,我们观察到以下输出:
original attributes in mapper:
{'built_default': False, 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder()), ('children', LabelEncoder()), ('salary', LabelEncoder())], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
after adding a new attributes 'addedKey':
{'built_default': False, 'addedKey': 'addedValue', 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder()), ('children', LabelEncoder()), ('salary', LabelEncoder())], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
dump the mapper into a pickler file:
load the mapper from the pickle file:
after being loaded, the attributes in the mapper are:
{'built_default': False, 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
我们可以看到,当映射器从泡菜文件中加载回来时,属性 'addedKey': 'addedValue'
丢失了。
sklearn_pandas.DataFrameMapper
有一个自定义的__setstate__
method, to attempt to maintain pickle compatibility with pickles created on older versions. (Here's 1.8.0版本的方法。)这个__setstate__
负责恢复一个unpickled实例的状态,它完全忽略了你的添加属性。
Pickle 实现自定义是尝试将您自己的属性添加到其他人的 类 通常不是一个好主意的原因之一。
我遇到了一个问题,即在将实例转储到 pickle 文件并将其加载回来后实例的某些属性丢失。谁能帮忙解释一下?谢谢!
这是一个具体的例子:
File/directory hierachy:
-test
-test_module
-__init__.py
-myDataFrameMapper.py
-mySklearn.py
-main.py
__init__.py:
from .mySklearn import mySklearn
mySklearn.py
import sklearn_pandas as sk_pd
from .myDataFrameMapper import myDataFrameMapper
class mySklearn:
def initialize():
sk_pd.DataFrameMapper.myTransform = myDataFrameMapper.transform()
myDataFrameMapper.py
import numpy as np
from sklearn_pandas import DataFrameMapper
class myDataFrameMapper:
def transform():
def closure(self, df, **kwargs):
self.addedKey = 'addedValue' # a new attribute is added here
return closure
main.py
import pandas as pd
import pickle
import random
from sklearn_pandas import DataFrameMapper
from sklearn.preprocessing import StandardScaler, LabelEncoder
from test_module import mySklearn
mySklearn.initialize()
data = {'pet':["cat", "dog", "dog", "fish", "cat", "dog", "cat", "fish"],
'children':[4., 6, 3, 3, 2, 3, 5, 4],
'salary':[90, 24, 44, 27, 32, 59, 36, 27]}
df = pd.DataFrame(data)
column_tuples = [
('pet', LabelEncoder()),
('children', LabelEncoder()),
('salary', LabelEncoder())
]
mapper = DataFrameMapper(column_tuples, input_df=True)
mapper.fit(data)
print('original attributes in mapper:')
print(mapper.__dict__)
mapper.myTransform(df.iloc[[1]])
print('\nafter adding a new attributes \'addedKey\':')
print(mapper.__dict__)
print('\ndump the mapper into a pickle file...')
picklefile = open('mapper.pkl', 'wb')
pickle.dump(mapper, picklefile)
picklefile.close()
print('\nload the mapper from the pickle file...')
picklefile = open('mapper.pkl', 'rb')
mapper1 = pickle.load(picklefile)
picklefile.close()
print('\nafter being loaded, the attributes in the mapper are:')
print(mapper1.__dict__)
在 运行 python3 main.py
之后,我们观察到以下输出:
original attributes in mapper:
{'built_default': False, 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder()), ('children', LabelEncoder()), ('salary', LabelEncoder())], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
after adding a new attributes 'addedKey':
{'built_default': False, 'addedKey': 'addedValue', 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder()), ('children', LabelEncoder()), ('salary', LabelEncoder())], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
dump the mapper into a pickler file:
load the mapper from the pickle file:
after being loaded, the attributes in the mapper are:
{'built_default': False, 'sparse': False, 'input_df': True, 'df_out': False, 'features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'default': False, 'built_features': [('pet', LabelEncoder(), {}), ('children', LabelEncoder(), {}), ('salary', LabelEncoder(), {})], 'transformed_names_': []}
我们可以看到,当映射器从泡菜文件中加载回来时,属性 'addedKey': 'addedValue'
丢失了。
sklearn_pandas.DataFrameMapper
有一个自定义的__setstate__
method, to attempt to maintain pickle compatibility with pickles created on older versions. (Here's 1.8.0版本的方法。)这个__setstate__
负责恢复一个unpickled实例的状态,它完全忽略了你的添加属性。
Pickle 实现自定义是尝试将您自己的属性添加到其他人的 类 通常不是一个好主意的原因之一。