在 R 中指定起始聚类中心时的 kmeans 错误？

Question

我正在尝试在 R 中逐步运行 kmeans。当我设置 iter.max = 1 并指定起始聚类中心代替 k 时，算法似乎是运行ning 直到它收敛而不是指定的 1 次迭代。

谁能确认这是一个已知错误？如果没有，我还缺少什么吗？

这是我的参考代码：

# Set up data
data <- data.frame(names = c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2"), 
                   x = c(2, 2, 8, 5, 7, 6, 1, 4),
                   y = c(10, 5, 4, 8, 5, 4, 2, 9))

initial_centers <- matrix(c(2, 5, 1, 10, 8, 2), ncol=2)

# Run k means for 1 iteration
model <- kmeans(data[,-1], initial_centers, iter.max=1)
model$centers

# Actual Output:
#          x        y
# 1 3.666667 9.000000
# 2 7.000000 4.333333
# 3 1.500000 3.500000

# Expected Output:
#          x        y
# 1 2.000000 10.00000
# 2 6.000000 6.000000
# 3 1.500000 3.500000

Answer 1

R 中的默认 k-means 算法比您在 class 中学到的更聪明。这是 Hartigan 和 Wong 的算法。

如果您想将每个点分配给最近的预定义中心，请不要为此滥用 kmeans。相反，只需计算距离并使用 argmin。

在 R 中指定起始聚类中心时的 kmeans 错误？

kmeans bug when specifying starting cluster centers in R?

r

cluster-analysis

k-means