每个集群到最远点和最近点的距离 - kmeans
The distance to the farthest and closest points for each cluster - kmeans
在下图中,我有两个数据集群。对于一个新的数据点 (A),我可以得到每个簇从 A 到最远点 "with red circle" 的距离以及到最近点 "with purple circles" 的距离吗?
简单地说,对于每个集群,我需要从 A "a new point" 到每个集群的最近点和最远点的距离。
Sklearn 库是否为此提供了功能,或者我必须手动执行此操作?!
你指出的那些实际上不是最近和最远的。您在绿色 class 中圈出的收盘价只是因为您在两个轴上的缩放比例不同而看起来很收盘价。欧氏距离不会给你那个点作为收盘点。
除此之外,是的,您需要自己实施。这是一个示例代码:
代码:
import numpy as np
from sklearn.cluster import KMeans
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.predict([[0, 0], [4, 4]])
from sklearn.metrics.pairwise import euclidean_distances
data = np.array([[5, 0], [-4, 10], [0, 3]])
dists = euclidean_distances(data, X)
for i in range(len(data)):
print("data: %s" % str(data[i, :]))
for x in range(kmeans.n_clusters):
min_dist = min(dists[i, kmeans.labels_ == x])
max_dist = max(dists[i, kmeans.labels_ == x])
print("cluster: %d\n\tcloses: %s: %g\n\tfarthest: %s: %g"
% (x,
str(X[dists[i, :] == min_dist, :]),
min_dist,
str(X[dists[i, :] == max_dist, :]),
max_dist))
输出:
data: [5 0]
cluster: 0
closes: [[1 0]]: 4
farthest: [[1 4]]: 5.65685
cluster: 1
closes: [[4 0]]: 1
farthest: [[4 4]]: 4.12311
data: [-4 10]
cluster: 0
closes: [[1 4]]: 7.81025
farthest: [[1 0]]: 11.1803
cluster: 1
closes: [[4 4]]: 10
farthest: [[4 0]]: 12.8062
data: [0 3]
cluster: 0
closes: [[1 2]
[1 4]]: 1.41421
farthest: [[1 0]]: 3.16228
cluster: 1
closes: [[4 2]
[4 4]]: 4.12311
farthest: [[4 0]]: 5
在下图中,我有两个数据集群。对于一个新的数据点 (A),我可以得到每个簇从 A 到最远点 "with red circle" 的距离以及到最近点 "with purple circles" 的距离吗?
简单地说,对于每个集群,我需要从 A "a new point" 到每个集群的最近点和最远点的距离。
Sklearn 库是否为此提供了功能,或者我必须手动执行此操作?!
你指出的那些实际上不是最近和最远的。您在绿色 class 中圈出的收盘价只是因为您在两个轴上的缩放比例不同而看起来很收盘价。欧氏距离不会给你那个点作为收盘点。
除此之外,是的,您需要自己实施。这是一个示例代码:
代码:
import numpy as np
from sklearn.cluster import KMeans
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
kmeans.predict([[0, 0], [4, 4]])
from sklearn.metrics.pairwise import euclidean_distances
data = np.array([[5, 0], [-4, 10], [0, 3]])
dists = euclidean_distances(data, X)
for i in range(len(data)):
print("data: %s" % str(data[i, :]))
for x in range(kmeans.n_clusters):
min_dist = min(dists[i, kmeans.labels_ == x])
max_dist = max(dists[i, kmeans.labels_ == x])
print("cluster: %d\n\tcloses: %s: %g\n\tfarthest: %s: %g"
% (x,
str(X[dists[i, :] == min_dist, :]),
min_dist,
str(X[dists[i, :] == max_dist, :]),
max_dist))
输出:
data: [5 0]
cluster: 0
closes: [[1 0]]: 4
farthest: [[1 4]]: 5.65685
cluster: 1
closes: [[4 0]]: 1
farthest: [[4 4]]: 4.12311
data: [-4 10]
cluster: 0
closes: [[1 4]]: 7.81025
farthest: [[1 0]]: 11.1803
cluster: 1
closes: [[4 4]]: 10
farthest: [[4 0]]: 12.8062
data: [0 3]
cluster: 0
closes: [[1 2]
[1 4]]: 1.41421
farthest: [[1 0]]: 3.16228
cluster: 1
closes: [[4 2]
[4 4]]: 4.12311
farthest: [[4 0]]: 5