来自 sklearn make_blobs() 的一维双峰数据散点图

Question

sklearn make_blobs() 函数可用于生成用于聚类的各向同性高斯斑点。

我正在尝试绘制由 make_blobs() 函数生成的数据。

import numpy as np
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=1, random_state=1)
a = plt.hist(arr, bins=np.arange(int(np.min(arr))-1,int(np.max(arr))+1,0.5), width = 0.3)

这段代码给出了正态分布图，很有道理。

blobs, blob_labels = make_blobs(n_samples=1000, n_features=2, 
                                centers=2, random_state=1)

a = plt.scatter(blobs[:, 0], blobs[:, 1], c=blob_labels)

这段代码给出了一个 2-clusters 图，这也很有意义。

我想知道有没有办法用参数 centers=2 n_features=1.

绘制由 make_blobs() 函数生成的数据

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=2, random_state=1)

我试过plt.hist()，它给出了另一个正态分布图。

我不知道如何使用 plt.scatter() 数据。

我无法想象情节应该是什么样子。

Answer 1

你的问题有点不清楚。

I've tried plt.hist(), which gives another normal distribution plot.

嗯，不完全是；它给出了一个双峰高斯混合图：

arr, blob_labels = make_blobs(n_samples=1000, n_features=1, 
                                centers=2, random_state=1)

a = plt.hist(arr, bins=np.arange(int(np.min(arr))-1,int(np.max(arr))+1,0.5), width = 0.3)

正如预期的那样，因为现在我们有 centers=2。

I have no idea how to use plt.scatter() with the data.

根据定义，散点图需要二维数据；来自 docs:

A scatter plot of y vs x with varying marker size and/or color.

而在这里，由于 n_features=1，我们实际上只有 x 而没有 y.

一维"scatter plot"实际上是一条线，我们可以用plot来形象化它，How to plot 1-d data at given y-value with pylab中有很好的解释；在你的情况下：

val = 0. # this is the value where you want the data to appear on the y-axis.
a = plt.plot(arr, np.zeros_like(arr) + val, 'x')

当然我们应该记住，纵轴只是为了方便可视化，对于我们没有任何 y 值的数据没有任何意义。

想为每个中心使用不同颜色的 and/or 标记？

val = 0. # this is the value where you want the data to appear on the y-axis.
plt.plot(arr[blob_labels==0], np.zeros_like(arr[blob_labels==0]) + val, 'x', color='y')
plt.plot(arr[blob_labels==1], np.zeros_like(arr[blob_labels==1]) + val, '+', color='b')
plt.show()

对于更大的样本，情况开始变得更有趣；注意 n_samples=10000:

的重叠

来自 sklearn make_blobs() 的一维双峰数据散点图

Scatter plot of 1-D bimodal data from sklearn make_blobs()

python

matplotlib

scatter-plot

scikit-learn