"compare" 聚类算法及其成员的差异

difference in "compare" cluster algorithms and its memberships

在 igraph 中,应用于聚类算法的“比较”命令与其应用于聚类成员的区别是什么。

如手册页所述:

compare (sg, le, method = "rand")    
compare (membership (sg), membership (le))

我看过 compare 的文档,它的格式是:

compare(comm1, comm2, method = c("vi", "nmi", "split.join", "rand", "adjusted.rand"))

comm1comm2 的文档提到了以下内容:

comm1 : A communities object containing a community structure; or a numeric vector, the membership vector of the first community structure. The membership vector should contain the community id of each vertex, the numbering of the communities starts with one.

最后提到的完整代码是

g <- make_graph("Zachary")
sg <- cluster_spinglass(g)
le <- cluster_leading_eigen(g)
compare(sg, le, method="rand")
compare(membership(sg), membership(le))

现在第一种情况compare(sg, le, method="rand")

sgle 是簇对象本身,即它们分别是 community detection via spin-glass model and community detetection by calculating the leading non-negative eigenvector of the modularity matrix of the graph 的结果。简而言之,两者都包含数据的社区结构。

现在第二种情况compare(membership(sg), membership(le))

这使用 membership 执行以下操作:

membership gives the division of the vertices, into communities. It returns a numeric vector, one value for each vertex, the id of its community. Community ids start from one. Note that some algorithms calculate the complete (or incomplete) hierarchical structure of the communities, and not just a single partitioning. For these algorithms typically the membership for the highest modularity value is returned, but see also the manual pages of the individual algorithms

您可以阅读有关函数 here 的更多信息。 所以正如你所看到的,这个returns一个包含每个顶点的隶属信息的数值向量,这是comm1comm2参数中允许的第二种类型的值compare函数。

因此,这两种说法本质上是一样的。它们只是完成同一件事的不同方式。

如果您 运行 文档末尾给出的代码,您将看到以下内容:

> g <- make_graph("Zachary")
> sg <- cluster_spinglass(g)
> le <- cluster_leading_eigen(g)
> compare(sg, le, method="rand")
[1] 0.9500891
> compare(membership(sg), membership(le))
[1] 0.2765712

结果的差异是因为 method 属性在第一次调用中设置为 rand。如果您在第二次调用中也分配 method 属性,您将看到完全相同的结果:

> g <- make_graph("Zachary")
> sg <- cluster_spinglass(g)
> le <- cluster_leading_eigen(g)
> compare(sg, le, method="rand")
[1] 0.9500891
> compare(membership(sg), membership(le), method="rand")
[1] 0.9500891

如您所见,两者提供相同的结果。

参考: