Python 统计 Kmeans 中标签的出现次数

Question

我正在尝试将来自 Sklears Kmeans 的标签列表与另一个数据集的预测标签进行比较。但是标签列表的大小不同，所以我想要每个标签的出现。

所以我已经尝试使用 Counter，但我没有得到我想要的。目前我正在使用 np.unique，但仍然存在一些问题。

例如：

X = np.array([[1, 2], [1, 4], [1, 0],[4, 2], [4, 4], [4, 0]])

kmeans = KMeans(n_clusters=4, random_state=0).fit(X)

Unique,count = np.unique(kmeans.labels_,return_index=True)

print(count) # [2 2 1 1] so far so good

New_Labels = kmeans.predict([[0, 4], [4, 4],[0,5],[1,6],[7,2],[4,0],[4,2]])

print(New_Labels) # [3 0 3 3 0 2 0] also good

Unique1,count1 = np.unique(Labels,return_index=True)

那么我的问题就在这里。

print(Unique1,count1) #[3 1 3]

如果集群的标签不存在，我希望标签计数的输出也显示为 0。所以我希望我预测的标签数是

[3 0 1 3]

Answer 1

您可以使用以下列表理解，它遍历所有可能的集群分配和 .count 每个元素的出现：

[l.count(i) for i in range(max(l)+1)]
[3, 0, 1, 3]

Python 统计 Kmeans 中标签的出现次数

Python count occurrences of labels in Kmeans

python

count

k-means