k-means 中的初始质心

Question

所以我在网上找到了一个描述：

Start with the center of all points. Choose successively the point that is the furthest away from all centers as a center for the next cluster.

因此我认为：

center = 所有点的平均值

centroid1 = 离中心最远的点

centroid2 = 离中心最远的点 AND centroid1

centroid3 = 离中心最远的点 AND centroid1 AND centroid2。

我的问题是，我应该如何计算离中心和质心 1 最远的点？我是否平均它们然后选择离中间最远的点？我是否计算与 center 和 centroid1 的最大距离点并选择更远的一个？如果是这样，centroid3 会不会等于 centroid1 或 2？

Answer 1

在本文中Centroids Initialization for K-Means Clustering using Improved Pillar Algorithm最远的意思是总和。因此，在第二步中，您需要将与第一个质心的距离和距离形成每个点的所有点的平均值相加，然后选择最大的一个。

提供的伪代码中的相关行是

2. Calculate D <- dis(X, m)
...
6. Set i = 1 as counter to determine the i-th initial centroid
7. DM = DM + D
8. Select x <- xargmax(DM) as the candidate for i-th initial centroids

To select a next x for the candidate of the rest initial centroids, D_i (where i is the current iteration step) is recalculated between each data points and c_i-1 . The D_i is then added to the accumulated distance metric DM (DM <- DM + D_i).

k-means 中的初始质心

Initial centroids in k-means

algorithm

math

k-means