kmeans 簇号与 k 值不匹配
kmeans cluster number does not match with k value
基于 this article 的代码在我仅定义 3 个集群时按预期工作。但是当我改变簇的数量时,我没有得到相同数量的簇。
from matplotlib import image as img
from matplotlib import pyplot as plt
import pandas as pd
image = img.imread("my_logo1.jpg")
image.shape
r = []
g = []
b = []
for line in image:
for pixel in line:
temp_r, temp_g, temp_b = pixel
r.append(temp_r / 255)
g.append(temp_g / 255)
b.append(temp_b / 255)
df = pd.DataFrame({"red": r, "green": g, "blue": b})
from scipy.cluster.vq import kmeans
cluster_centers, distortion = kmeans(df[["red", "green", "blue"]], 7)
print(cluster_centers)
cluster centers returned are only 3, expected 7
我希望 return 返回与 kmeans 函数中定义的相同数量的颜色。
正在阅读 kmeans()
function, you can note the use of a supporting function _kmeans()
的源代码,您可以在其中找到:
code_book = code_book[has_members]
has_members
is a boolean array indicating which clusters have members, resulting from _vq.update_cluster_means()
.
简而言之,当您指定簇数k
时,算法returns一组质心(最多k
) 失真最低。在 K-means 的 update-step 期间简单地删除空簇。
基于 this article 的代码在我仅定义 3 个集群时按预期工作。但是当我改变簇的数量时,我没有得到相同数量的簇。
from matplotlib import image as img
from matplotlib import pyplot as plt
import pandas as pd
image = img.imread("my_logo1.jpg")
image.shape
r = []
g = []
b = []
for line in image:
for pixel in line:
temp_r, temp_g, temp_b = pixel
r.append(temp_r / 255)
g.append(temp_g / 255)
b.append(temp_b / 255)
df = pd.DataFrame({"red": r, "green": g, "blue": b})
from scipy.cluster.vq import kmeans
cluster_centers, distortion = kmeans(df[["red", "green", "blue"]], 7)
print(cluster_centers)
cluster centers returned are only 3, expected 7
我希望 return 返回与 kmeans 函数中定义的相同数量的颜色。
正在阅读 kmeans()
function, you can note the use of a supporting function _kmeans()
的源代码,您可以在其中找到:
code_book = code_book[has_members]
has_members
is a boolean array indicating which clusters have members, resulting from _vq.update_cluster_means()
.
简而言之,当您指定簇数k
时,算法returns一组质心(最多k
) 失真最低。在 K-means 的 update-step 期间简单地删除空簇。