为什么分配给集群的数据会随着集群数量的增加而减少?
Why is data assigned to the cluster reducing with the number of clusters?
这是我用来对时间序列数据进行聚类的代码示例。我的数据是12153个相同长度的样本。
当我对数据进行聚类时,我意识到分配给聚类的数据样本减少了聚类的数量。例如,当集群为两个时,分配只有 12151 个样本。当集群为 3 时,分配有 12150 等等。我不明白为什么会这样。我在下面的代码中有什么地方做错了吗?
def k_means_clust_eucl(self, data, initial_centroids):
'''
k-means clustering algorithm for time series data.
using Euclidean distance
'''
# create random centroids
while True:
orig = [i for i in range(12153)]
self.new_centroids = deepcopy(self.centroids)
# print('iteration ' + str(self.i))
# assign data points to clusters
self.assignments = {}
# print('while_clustering :', len(data))
for ind, i in enumerate(data):
min_dist = float('inf')
closest_clust = None
for c_ind, j in enumerate(self.centroids):
cur_dist = self.euclid_dist(i, j)
if cur_dist < min_dist:
min_dist = cur_dist
closest_clust = c_ind
if closest_clust in self.assignments:
self.assignments[closest_clust].append(ind)
if ind in orig:
orig.remove(ind)
else:
print(ind)
else:
print('not in assignment')
self.assignments[closest_clust] = []
print(orig)
因为你忘了把每个cluster的first点放到新建的cluster中
相反,在第一点之后,您的集群是 []
。
这是我用来对时间序列数据进行聚类的代码示例。我的数据是12153个相同长度的样本。
当我对数据进行聚类时,我意识到分配给聚类的数据样本减少了聚类的数量。例如,当集群为两个时,分配只有 12151 个样本。当集群为 3 时,分配有 12150 等等。我不明白为什么会这样。我在下面的代码中有什么地方做错了吗?
def k_means_clust_eucl(self, data, initial_centroids):
'''
k-means clustering algorithm for time series data.
using Euclidean distance
'''
# create random centroids
while True:
orig = [i for i in range(12153)]
self.new_centroids = deepcopy(self.centroids)
# print('iteration ' + str(self.i))
# assign data points to clusters
self.assignments = {}
# print('while_clustering :', len(data))
for ind, i in enumerate(data):
min_dist = float('inf')
closest_clust = None
for c_ind, j in enumerate(self.centroids):
cur_dist = self.euclid_dist(i, j)
if cur_dist < min_dist:
min_dist = cur_dist
closest_clust = c_ind
if closest_clust in self.assignments:
self.assignments[closest_clust].append(ind)
if ind in orig:
orig.remove(ind)
else:
print(ind)
else:
print('not in assignment')
self.assignments[closest_clust] = []
print(orig)
因为你忘了把每个cluster的first点放到新建的cluster中
相反,在第一点之后,您的集群是 []
。