SKlearn SGD Partial Fit error: Number of features 378 does not match previous data 4598

SKlearn SGD Partial Fit error: Number of features 378 does not match previous data 4598

我已经 pkl 我的分类器并在另一个笔记本中打开并尝试在分类器上执行 partial_fit 但收到错误特征数 378 与之前的数据 4598 不匹配。

with open("models/count_vect_Item Group.pkl", 'r') as f:
 global count_vect_item_group
 count_vect_item_group = joblib.load(f)

with open("models/model_Item Group.pkl", 'r') as f:
 global model_predicted_item_group
 model_predicted_item_group = joblib.load(f)

count_matrix_X_train = count_vect_item_group.fit_transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)

model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test )

无法使用新数据集训练我的分类器。

这个错误是因为在你 pickle 分类器之前,你用 4598 个特征(X 中的列数)训练了它,而现在只有 378 个。它应该与旧功能相同。

如何做到这一点,只调用 count_vect_item_group.transform()。您现在再次在 count_vect_item_group 上调用 fit_transform() ,它会忘记之前学习的数据,并适应新数据,因此找到的特征数量比以前少。

将您的代码更改为:

count_matrix_X_train = count_vect_item_group.transform(X_test)
X_train_tf_idf = tf_idf(count_matrix_X_train)

model_predicted_item_group.partial_fit(X_train_tf_idf, labels_test)