MiniBatchKMeans Python

我正在使用 scikitlearn 中的函数 MiniBatchKMeans()。出色地，在其文档中有：

batch_size : int, optional, default: 100 Size of the mini batches.

init_size : int, optional, default: 3 * batch_size Number of samples to randomly sample for speeding up the initialization (sometimes at the expense of accuracy): the only algorithm is initialized by running a batch KMeans on a random subset of the data. This needs to be larger than n_clusters.

我不是很理解，因为好像mini batch的最终维度是3*batch_size而不是[指定的那个=20=]batch_size 参数。

我是不是误会了什么。如果是这样，有人可以解释这两个论点。没错，为什么会有这两个论点，因为它们似乎是多余的。

谢谢！！！

批量大小由 batch_size 句点定义。此外，您可以定义 init_size，这是用于 initialize 过程的样本大小，并且 默认情况下 是 3*batch_size。您可以简单地设置 bath_size=100 和 init_size=10 然后使用 10 个样本来执行初始化（kmeans 不是全局收敛的，有很多技术可以在初始化阶段处理它），然后在 batch 100 将在算法执行期间使用。

MiniBatchKMeans Python

MiniBatchKMeans Python

python

machine-learning

cluster-computing

scikit-learn