SKlearn SGD Partial Fit error: Number of features 378 does not match previous data 4598
SKlearn SGD Partial Fit error: Number of features 378 does not match previous data 4598
我已经 pkl 我的分类器并在另一个笔记本中打开并尝试在分类器上执行 partial_fit 但收到错误特征数 378 与之前的数据 4598 不匹配。
with open("models/count_vect_Item Group.pkl", 'r') as f:
global count_vect_item_group
count_vect_item_group = joblib.load(f)
with open("models/model_Item Group.pkl", 'r') as f:
global model_predicted_item_group
model_predicted_item_group = joblib.load(f)
count_matrix_X_train = count_vect_item_group.fit_transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)
model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test )
无法使用新数据集训练我的分类器。
这个错误是因为在你 pickle 分类器之前,你用 4598 个特征(X 中的列数)训练了它,而现在只有 378 个。它应该与旧功能相同。
如何做到这一点,只调用 count_vect_item_group.transform()
。您现在再次在 count_vect_item_group
上调用 fit_transform() ,它会忘记之前学习的数据,并适应新数据,因此找到的特征数量比以前少。
将您的代码更改为:
count_matrix_X_train = count_vect_item_group.transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)
model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test)
我已经 pkl 我的分类器并在另一个笔记本中打开并尝试在分类器上执行 partial_fit 但收到错误特征数 378 与之前的数据 4598 不匹配。
with open("models/count_vect_Item Group.pkl", 'r') as f:
global count_vect_item_group
count_vect_item_group = joblib.load(f)
with open("models/model_Item Group.pkl", 'r') as f:
global model_predicted_item_group
model_predicted_item_group = joblib.load(f)
count_matrix_X_train = count_vect_item_group.fit_transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)
model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test )
无法使用新数据集训练我的分类器。
这个错误是因为在你 pickle 分类器之前,你用 4598 个特征(X 中的列数)训练了它,而现在只有 378 个。它应该与旧功能相同。
如何做到这一点,只调用 count_vect_item_group.transform()
。您现在再次在 count_vect_item_group
上调用 fit_transform() ,它会忘记之前学习的数据,并适应新数据,因此找到的特征数量比以前少。
将您的代码更改为:
count_matrix_X_train = count_vect_item_group.transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)
model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test)