K-Means 图导致 R 中的虚幻点
K-Means plot resulting unreal points in R
我正在尝试绘制一个 K-Means 集群,以根据产品的平均库存和销售数量来分析不同类别的产品。
所有值均为非负值且具有相同的测量单位。
我不知道我做错了什么,结果包含负值点。实际上,我相信图中给出的所有点都不是我数据中的实际有效点。
这是我的代码:
reduced_dataset = dataset[1:20, 4:5]
# Using the elbow method to find the optimal number of clusters
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(reduced_dataset, i)$withinss)
plot(1:10,
wcss,
type = 'b',
main = paste('The Elbow Method'),
xlab = 'Number of clusters',
ylab = 'WCSS')
# As a result, number of clusters should be 2
# Fitting K-Means to the dataset
kmeans = kmeans(x = reduced_dataset, centers = 2)
y_kmeans = kmeans$cluster
# Visualising the clusters
library(cluster)
clusplot(reduced_dataset,
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste('Clusters of categories - NOT ON SALE'),
xlab = 'Average Sold Quantity',
ylab = 'Average Inventory')
dput(reduced_dataset):
structure(list(Avg_Sold_No_Promo = c(0.255722695, 1.139983236,
0.458651842, 0.784966698, 1.642746914, 0.115264798, 7.50338696,
0.487603306, 1.023373984, 0.956099815, 1.505901506, 0.253837072,
0.834963325, 0.880898876, 6.527699531, 11.54054054, 3.44077135,
0.750182882, 0.251033058, 1.875698324), Avg_Inventory_No_Promo =
c(6.068672335,
22.57865326, 9.00694927, 11.56137012, 28.47530864, 7.485981308,
170.9064352, 11.07438017, 22.80792683, 40.63863216, 41.73463573,
10.87603306, 35.87408313, 46.09213483, 185.5671362, 315.6015693,
165.1129477, 78.18032187, 9.65857438, 198.4385475)), .Names =
c("Avg_Sold_No_Promo",
"Avg_Inventory_No_Promo"), row.names = c(NA, 20L), class = "data.frame")
有人可以帮我吗?
clusplot
函数会自动执行此操作。
它被称为 PCA,这也是为什么您会看到其中解释了可变性的原因。
我正在尝试绘制一个 K-Means 集群,以根据产品的平均库存和销售数量来分析不同类别的产品。
所有值均为非负值且具有相同的测量单位。 我不知道我做错了什么,结果包含负值点。实际上,我相信图中给出的所有点都不是我数据中的实际有效点。
这是我的代码:
reduced_dataset = dataset[1:20, 4:5]
# Using the elbow method to find the optimal number of clusters
wcss = vector()
for (i in 1:10) wcss[i] = sum(kmeans(reduced_dataset, i)$withinss)
plot(1:10,
wcss,
type = 'b',
main = paste('The Elbow Method'),
xlab = 'Number of clusters',
ylab = 'WCSS')
# As a result, number of clusters should be 2
# Fitting K-Means to the dataset
kmeans = kmeans(x = reduced_dataset, centers = 2)
y_kmeans = kmeans$cluster
# Visualising the clusters
library(cluster)
clusplot(reduced_dataset,
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste('Clusters of categories - NOT ON SALE'),
xlab = 'Average Sold Quantity',
ylab = 'Average Inventory')
dput(reduced_dataset):
structure(list(Avg_Sold_No_Promo = c(0.255722695, 1.139983236,
0.458651842, 0.784966698, 1.642746914, 0.115264798, 7.50338696,
0.487603306, 1.023373984, 0.956099815, 1.505901506, 0.253837072,
0.834963325, 0.880898876, 6.527699531, 11.54054054, 3.44077135,
0.750182882, 0.251033058, 1.875698324), Avg_Inventory_No_Promo =
c(6.068672335,
22.57865326, 9.00694927, 11.56137012, 28.47530864, 7.485981308,
170.9064352, 11.07438017, 22.80792683, 40.63863216, 41.73463573,
10.87603306, 35.87408313, 46.09213483, 185.5671362, 315.6015693,
165.1129477, 78.18032187, 9.65857438, 198.4385475)), .Names =
c("Avg_Sold_No_Promo",
"Avg_Inventory_No_Promo"), row.names = c(NA, 20L), class = "data.frame")
有人可以帮我吗?
clusplot
函数会自动执行此操作。
它被称为 PCA,这也是为什么您会看到其中解释了可变性的原因。