KNN 需要训练吗?

Does KNN need training?

KNN的概念是寻找距离所需数据最近的数据点。

因此在测试模型之前没有数学或过程。

它所做的只是找到最接近的 K 个点,这意味着没有训练过程。

如果这是正确的,那么 python 中 KNN 的训练过程会发生什么??

from sklearn.neighbors import KNeighborsClassifier  
classifier = KNeighborsClassifier(n_neighbors=5)  
classifier.fit(X_train, y_train) 

当调用 fit 时,后台会发生一些事情。

如果该过程不需要计算,那会发生什么

KNN 本身并不是一种特定的算法,而是一种可以通过多种方式实现的方法。 nearest neighbors is to select one or more examples from the training data to decide the predicted value for the sample at hand. The simplest way to do that is to simply iterate through the whole dataset and pick the closest data points from the training dataset. In that case, you could skip the fitting step, or you could see the fitting as the production of a callable function that runs that loop. Even in that case, is you are using a library like scikit-learn 背后的想法,为所有预测器维护相似的接口很有用,因此您可以为它们编写通用代码(例如,独立于所使用的特定算法的训练代码)。

但是,您也可以为 KNN 做更聪明的事情。在 scikit-learn 中,您会看到 KNeighborsClassifier implements three different algorithms. One is brute force, which is just traversing the whole dataset as described, but you also have BallTree (wiki) and KDTree (wiki)。这些是可以加速最近邻居搜索的数据结构,但它们需要从数据中提前构建。所以这里的拟合步骤是构建数据结构,帮助您找到最近的邻居。