如何在词袋上做 K-NN

How to do K-NN on Bag of words

我有一个训练集和测试集(大小相等)。我已经完成了词袋模型,我正在尝试在其上做 K 最近邻,但我不确定如何进行拟合。

词袋模型:

from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')

bow = bow_vectorizer.fit(TrainData)
print(bow_vectorizer.vocabulary_)
bowTrain = bow_vectorizer.fit_transform(TrainData)
bowTest = bow_vectorizer.fit_transform(TestData)

尝试在词袋模型上做 KNN,但我不确定我应该在 "knn.fit" 部分

中放入什么
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, ???? )
predict = knn.predict(bowTest[0:5000])
from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_features=100, stop_words='english')

X_train = TrainData
#y_train = your array of labels goes here
bowVect = bow_vectorizer.fit(X_train)

您可能应该使用相同的矢量化器,因为词汇表可能会发生变化。

bowTrain = bowVect.transform(X)
bowTest = bowVect.transform(TestData)

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(bowTrain, y_train )
predict = knn.predict(bowTest[0:5000])