pyclustering clarans.get_clusters() returns 空列表

Question

我正在尝试使用 pyclustering 模块进行 CLARANS 聚类，但是对于我已经尝试过的所有数据 clarans(data, number_clusters, numlocal, maxneighbor) 所有方法 returns [] （一个空列表）无论参数值是什么。我制作了一个随机数据来测试该方法，但答案是一样的，唯一有效的数据是 iris 数据 datasets.load_iris() 形式 sklearn 模块。我做错了什么吗？

这是测试数据：

import pandas as pd
import numpy as np
import seaborn as sns
from pyclustering.cluster.clarans import clarans

x1 = np.random.normal(10, 5, 100)
x2 = np.random.normal(30, 5, 100)
x = np.concatenate((x1, x2), axis=0)
y1 = np.random.normal(50, 5, 100)
y2 = np.random.normal(60, 5, 100)
y = np.concatenate((y1, y2), axis=0)
Gr = np.array(['G1']*100 + ['G2']*100)

df = pd.DataFrame(x, columns=['X'])
df['Y'] = y
df['Gr'] = Gr

这是我运行聚类技术（我将 df 转换为二维列表）的时候：

datalist = np.zeros((200,2))
for i in df.index:
    datalist[i][0] = round(float(df['X'][i]), 2)
    datalist[i][1] = round(float(df['Y'][i]), 2)

cluster_clarans = clarans(datalist, 2, 6, 4)

cluster_clarans.get_clusters()

答案是：

[]

Answer 1

您忘记在打印簇之前使用 process() 函数执行处理。

此声明

cluster_clarans = clarans(datalist, 2, 6, 4)

初始化 class 对象。然后你需要调用 process 方法。

cluster_clarans.process()

现在，当您打印出簇时，您将得到 2 个列表，每个列表包含每个簇中数据点的索引。参见get_clustershere.

的官方文档

print(cluster_clarans.get_clusters())

下面是完整的代码。请注意，我已更改以下随机数据的大小：

import pandas as pd
import numpy as np
import seaborn as sns
from pyclustering.cluster.clarans import clarans

x1 = np.random.normal(30, 10, 20)
x2 = np.random.normal(60, 5, 20)
x = np.concatenate((x1, x2), axis=0)
y1 = np.random.normal(20, 5, 20)
y2 = np.random.normal(40, 15, 20)
y = np.concatenate((y1, y2), axis=0)
Gr = np.array(['G1']*20 + ['G2']*20)

df = pd.DataFrame(x, columns=['X'])
df['Y'] = y
df['Gr'] = Gr

datalist = np.zeros((40,2))
for i in df.index:
    datalist[i][0] = round(float(df['X'][i]), 2)
    datalist[i][1] = round(float(df['Y'][i]), 2)

# Initialize the cluster object
cluster_clarans = clarans(datalist, 2, 6, 4)

# Process the data
cluster_clarans.process()

# Get the points in each clusters
print(cluster_clarans.get_clusters())
# Output: [[9,14,15,16,20,21,27,28,29,30,31,32,33,34,35,37,38,39],
           [0,1,2,3,4,5,6,7,8,10,11,12,13,17,18,19,22,23,24,25,36]]

您可以阅读有关 process() 函数的更多信息 here。

pyclustering clarans.get_clusters() returns 空列表

pyclustering clarans.get_clusters() returns empty list

python

cluster-analysis