如何在 cross_validate 之后导出/保存拟合模型并稍后在 pandas 使用它
How to export / save a fitted model after cross_validate and use it later on pandas
我正在使用 cross_validate sklearn 函数来拟合 RandomForest 分类器。
我想知道是否有办法导出拟合模型以保存它们并导入以预测新数据。
我尝试使用 return_estimator=True
选项
[return_estimator : boolean, default False Whether to return the
estimators fitted on each split.]
然后 joblib
保存估算器。但是当我加载保存的模型并尝试将其用于 predict
时,出现错误(见下文)。
rfc = RandomForestClassifier(n_estimators=100)
cv_results = cross_validate(rfc, X_train_std ,Y_train, scoring=scoring, cv=5, return_estimator=True)
rfc_fit = cv_results['estimator']
#save estimated model
savedir = ('C://Users//.......//src//US//')
from sklearn.externals import joblib
filename = os.path.join(savedir, 'final_model.joblib')
joblib.dump(rfc_fit,filename)
rfc_model2 = joblib.load(filename)
bla = rfc_model2.predict(X_test_std)
AttributeError: 'tuple' object has no attribute 'predict'
我想我对 return_estimator
真正回馈的东西感到困惑..
看起来它们不是合适的模型。那么,有没有办法提取在交叉验证期间拟合的模型以便重新使用它们?
谢谢
return_estimator
returns 所有拟合模型的 'tuple'。
要解决这个问题,你需要select想要的模型,保存,加载然后预测。
示例:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
rfc_fit = cv_results['estimator']
print(rfc_fit)
上面打印了3个模型:
(Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False), Lasso(alpha=1.0,
copy_X=True, fit_intercept=True, max_iter=1000, normalize=False,
positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False), Lasso(alpha=1.0,
copy_X=True, fit_intercept=True, max_iter=1000, normalize=False,
positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False))
要查看其中包含多少模型,请执行以下操作:
print(len(rfc_fit))
# 3
假设您想要 select 第一个模型:
# select the first model
rfc_fit = rfc_fit[0]
# save it
from sklearn.externals import joblib
filename = os.path.join(savedir, 'final_model.joblib')
joblib.dump(rfc_fit,filename)
# load it
rfc_model2 = joblib.load(filename)
Predict
现在工作正常:
predicted = rfc_model2.predict(X)
我正在使用 cross_validate sklearn 函数来拟合 RandomForest 分类器。 我想知道是否有办法导出拟合模型以保存它们并导入以预测新数据。
我尝试使用 return_estimator=True
选项
[return_estimator : boolean, default False Whether to return the estimators fitted on each split.]
然后 joblib
保存估算器。但是当我加载保存的模型并尝试将其用于 predict
时,出现错误(见下文)。
rfc = RandomForestClassifier(n_estimators=100)
cv_results = cross_validate(rfc, X_train_std ,Y_train, scoring=scoring, cv=5, return_estimator=True)
rfc_fit = cv_results['estimator']
#save estimated model
savedir = ('C://Users//.......//src//US//')
from sklearn.externals import joblib
filename = os.path.join(savedir, 'final_model.joblib')
joblib.dump(rfc_fit,filename)
rfc_model2 = joblib.load(filename)
bla = rfc_model2.predict(X_test_std)
AttributeError: 'tuple' object has no attribute 'predict'
我想我对 return_estimator
真正回馈的东西感到困惑..
看起来它们不是合适的模型。那么,有没有办法提取在交叉验证期间拟合的模型以便重新使用它们?
谢谢
return_estimator
returns 所有拟合模型的 'tuple'。
要解决这个问题,你需要select想要的模型,保存,加载然后预测。
示例:
from sklearn import datasets, linear_model
from sklearn.model_selection import cross_validate
diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = linear_model.Lasso()
cv_results = cross_validate(lasso, X, y, cv=3, return_estimator=True)
rfc_fit = cv_results['estimator']
print(rfc_fit)
上面打印了3个模型:
(Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False), Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False), Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False))
要查看其中包含多少模型,请执行以下操作:
print(len(rfc_fit))
# 3
假设您想要 select 第一个模型:
# select the first model
rfc_fit = rfc_fit[0]
# save it
from sklearn.externals import joblib
filename = os.path.join(savedir, 'final_model.joblib')
joblib.dump(rfc_fit,filename)
# load it
rfc_model2 = joblib.load(filename)
Predict
现在工作正常:
predicted = rfc_model2.predict(X)