K-Means 图导致 R 中的虚幻点

Question

我正在尝试绘制一个 K-Means 集群，以根据产品的平均库存和销售数量来分析不同类别的产品。

所有值均为非负值且具有相同的测量单位。我不知道我做错了什么，结果包含负值点。实际上，我相信图中给出的所有点都不是我数据中的实际有效点。

这是我的代码：

reduced_dataset = dataset[1:20, 4:5]

# Using the elbow method to find the optimal number of clusters
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(reduced_dataset, i)$withinss)
plot(1:10,
 wcss,
 type = 'b',
 main = paste('The Elbow Method'),
 xlab = 'Number of clusters',
 ylab = 'WCSS')

# As a result, number of clusters should be 2

# Fitting K-Means to the dataset
kmeans = kmeans(x = reduced_dataset, centers = 2)
y_kmeans = kmeans$cluster

# Visualising the clusters
library(cluster)
clusplot(reduced_dataset,
     y_kmeans,
     lines = 0,
     shade = TRUE,
     color = TRUE,
     labels = 2,
     plotchar = FALSE,
     span = TRUE,
     main = paste('Clusters of categories - NOT ON SALE'),
     xlab = 'Average Sold Quantity',
     ylab = 'Average Inventory')

dput(reduced_dataset):

structure(list(Avg_Sold_No_Promo = c(0.255722695, 1.139983236, 
0.458651842, 0.784966698, 1.642746914, 0.115264798, 7.50338696, 
0.487603306, 1.023373984, 0.956099815, 1.505901506, 0.253837072, 
0.834963325, 0.880898876, 6.527699531, 11.54054054, 3.44077135, 
0.750182882, 0.251033058, 1.875698324), Avg_Inventory_No_Promo = 
c(6.068672335, 
22.57865326, 9.00694927, 11.56137012, 28.47530864, 7.485981308, 
170.9064352, 11.07438017, 22.80792683, 40.63863216, 41.73463573, 
10.87603306, 35.87408313, 46.09213483, 185.5671362, 315.6015693, 
165.1129477, 78.18032187, 9.65857438, 198.4385475)), .Names = 
c("Avg_Sold_No_Promo", 
"Avg_Inventory_No_Promo"), row.names = c(NA, 20L), class = "data.frame")

有人可以帮我吗？

Answer 1

clusplot 函数会自动执行此操作。

它被称为 PCA，这也是为什么您会看到其中解释了可变性的原因。

K-Means 图导致 R 中的虚幻点

K-Means plot resulting unreal points in R

r

cluster-analysis

k-means