集群合并

Merging of clusters

我有一个描述对象组的矩阵。

n <- 6 # number of objects
group <- matrix(c(1,2,1,4,1,3,6,3,5,3,NA,NA,2,NA,2,NA,NA,6,NA,6,NA,NA,NA,NA,4,NA,NA,NA,NA,5),5,6)
colnames(group) <- colnames(group, do.NULL = FALSE, prefix = "obj.")
rownames(group) <- rownames(group, do.NULL = FALSE, prefix = "step.")
group #  an n-1 by n matrix
#        obj.1 obj.2 obj.3 obj.4 obj.5 obj.6
# step.1     1     3    NA    NA    NA    NA
# step.2     2     6    NA    NA    NA    NA
# step.3     1     3     2     6    NA    NA
# step.4     4     5    NA    NA    NA    NA
# step.5     1     3     2     6     4     5

我想创建一个在步骤中合并集群的矩阵。此矩阵等于 hclust 函数中返回的对象合并。

merge <- matrix(c(-1, -2, 1, -4, 3, -3, -6, 2, -5, 4), 5, 2)
merge
#      [,1] [,2]
# [1,]   -1   -3
# [2,]   -2   -6
# [3,]    1    2
# [4,]   -4   -5
# [5,]    3    4

merge is an n-1 by 2 matrix. Row i of merge describes the merging of clusters at step i of the clustering. If an element j in the row is negative, then observation -j was merged at this stage. If j is positive then the merge was with the cluster formed at the (earlier) stage j of the algorithm. Thus negative entries in merge indicate agglomerations of singletons, and positive entries indicate agglomerations of non-singletons.

我还没找到简单的解决办法。这个有什么功能吗?

基本上你有一组组(每行一个)...

group
#        obj.1 obj.2 obj.3 obj.4 obj.5 obj.6
# step.1     1     3    NA    NA    NA    NA
# step.2     2     6    NA    NA    NA    NA
# step.3     1     3     2     6    NA    NA
# step.4     4     5    NA    NA    NA    NA
# step.5     1     3     2     6     4     5

...并且您想知道前两行合并为当前行。

我首先创建一个矩阵,指示每个对象是否在特定行中:

(hasObs <- sapply(seq_len(ncol(group)), function(i) rowSums(!is.na(group) & group == i)))
#        [,1] [,2] [,3] [,4] [,5] [,6]
# step.1    1    0    1    0    0    0
# step.2    0    1    0    0    0    1
# step.3    1    1    1    0    0    1
# step.4    0    0    0    1    1    0
# step.5    1    1    1    1    1    1

我会用它来创建一个矩阵,其中每个元素 (i,j) 表示 j 出现的最近的前一行(在 i 之前)(如果没有这样的前一行,则为 -j):

(prevObs <- sapply(seq_len(ncol(hasObs)), function(i) {
  pos <- which(head(hasObs, -1)[,i] == 1)
  rep(c(-i, pos), diff(c(0, pos, nrow(hasObs))))
}))
#        [,1] [,2] [,3] [,4] [,5] [,6]
#          -1   -2   -3   -4   -5   -6
# step.1    1   -2    1   -4   -5   -6
# step.1    1    2    1   -4   -5    2
# step.3    3    3    3   -4   -5    3
# step.3    3    3    3    4    4    3

现在很容易确定哪些行被合并为当前行:

t(apply(hasObs*prevObs, 1, function(x) unique(x[x != 0])))
#        [,1] [,2]
# step.1   -1   -3
# step.2   -2   -6
# step.3    1    2
# step.4   -4   -5
# step.5    3    4

第一行合并单个元素1和3,下一行合并单个元素2和6,第三行合并前两组,第四行合并单个元素4和5,第五行合并行中的组3 和 4.