将table转换为矩阵进行聚类分析

Question

所以我有一个 table 告诉两个变量（V1 和 V2）同时出现的频率 (N)。这是一个示例：

> dput(ans)
structure(list(V1 = c(2L, 7L, 7L, 7L, 7L, 7L, 9L, 9L, 9L, 10L, 
10L, 11L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 14L, 14L, 
14L, 14L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L, 20L, 
20L, 21L, 25L, 29L, 29L, 29L, 33L, 35L, 38L, 42L, 46L, 46L, 46L, 
46L, 46L, 46L, 46L, 46L, 46L, 46L, 46L, 46L, 47L, 47L, 48L, 52L, 
52L, 52L, 52L, 52L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 57L, 57L, 
57L, 57L, 57L, 57L, 58L, 58L, 58L, 58L, 58L, 59L, 59L, 59L, 59L, 
60L, 60L, 60L, 61L, 61L, 62L, 65L, 65L, 65L, 65L, 67L, 67L, 67L, 
68L, 70L, 70L, 71L, 73L, 73L, 74L), V2 = c(3L, 8L, 20L, 21L, 
22L, 78L, 10L, 11L, 12L, 11L, 12L, 12L, 38L, 39L, 14L, 15L, 16L, 
17L, 18L, 29L, 64L, 15L, 16L, 17L, 18L, 16L, 17L, 18L, 17L, 18L, 
29L, 30L, 18L, 29L, 30L, 21L, 22L, 22L, 26L, 30L, 47L, 64L, 34L, 
36L, 39L, 43L, 47L, 48L, 49L, 52L, 65L, 67L, 70L, 71L, 72L, 73L, 
74L, 75L, 48L, 49L, 49L, 65L, 67L, 73L, 74L, 75L, 57L, 58L, 59L, 
60L, 61L, 62L, 63L, 58L, 59L, 60L, 61L, 62L, 63L, 59L, 60L, 61L, 
62L, 63L, 60L, 61L, 62L, 63L, 61L, 62L, 63L, 62L, 63L, 63L, 67L, 
73L, 74L, 75L, 73L, 74L, 75L, 69L, 71L, 72L, 72L, 74L, 75L, 75L
), N = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)),
 row.names = c(NA, -108L), class = c("data.table", "data.frame"))

我想将它转换为 696x696 矩阵，其中 V1 和 V2 作为行和列（行和列均为 1-696），N 作为值。 V1 和 V2 代表我数据集中的材料。如果 table 中不存在 V1 和 V2 组合，则该值应为 0。这是因为我想根据材料一起出现的频率，使用带质心函数的 hclust 对材料进行聚类。

编辑：我可以给出预期输出示例的唯一方法是我正在关注的一篇文章中的图片：

Answer 1

这是栅格的常见任务...使用栅格包并将其转换回矩阵可能不是最快的解决方案，但它适用于您的测试数据（此处命名为 df）。 ..

library(raster)

r <- raster(nrow=696, ncol=696, crs = NA,
            xmn = 0, xmx = 696, ymn = 0, ymx = 696)
# some indexing corrections
new_xy <- cbind(df[, 2] - 1, 697 - df[, 1])
cells <- cellFromXY(r, new_xy)
r[] <- 0
r[cells] <- unlist(df[, 3])
r <- as.matrix(r)

然后我们可以用 str(r) 检查它是一个 696x696 数字，并且 max(r) 是一个值 3，正如预期的那样。另外，r[2, 3] = 1

Answer 2

要复制您添加到原始问题的图片，我会这样做：

# convert your contingency table to the appropriate matrix
M <- sparseMatrix(df$V1, df$V2, x = df$N, dims = c(696, 696))
M <- as.matrix(M)
rownames(M) <- 1:696
colnames(M) <- 1:696

有许多格式选项可用于将矩阵显示为图像，但要开始，请尝试：

View(M)

将table转换为矩阵进行聚类分析

Converting table to matrix for clustering analysis

r

cluster-analysis

distance

matrix

correlation