如何在 R 中总结 pam 聚类结果?
How to summarise pam clustering results in R?
如果我尝试 运行 下面的代码来获取聚类结果的摘要,我会收到以下错误:
Error in UseMethod("mutate_") : no applicable method for 'mutate_'
applied to an object of class "table"
如果 dat_ 是数据框,此代码有效,但如果它是 table,则会收到上述错误消息。有没有人有办法解决吗?
pam_fit <- pam(gower_dist, diss = TRUE, k) # performs cluster analysis
pam_results <- dat %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.))
pam_results$the_summary
示例数据集:
set.seed(1)
dat <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))
dat <- unique(dat)
dat2 <- table(dat)
dat3 <- as.data.frame(dat)
如果您查看 dat,每个 ID 都有多个观察值,并且您正试图根据其疾病列将 ID 划分为多个簇。所以你的聚类结果应该和你的id一样长,如果你想总结你的结果,你可以按聚类做。
要将表格放在一起,请执行:
library(cluster)
library(tidyverse)
pam_fit <- pam(daisy(dat2,"gower"), diss = TRUE, 2) # performs cluster analysis
pam_results <- as.data.frame.matrix(table(dat)) %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.),freq = colSums(.))
总结如下:
pam_results$freq
[[1]]
chest pain depression flu headache infection inflammation
4 5 4 3 5 3
pain cluster
5 5
[[2]]
chest pain depression flu headache infection inflammation
1 2 2 2 2 2
pain cluster
0 4
如果你只需要频率,你可以简单地做:
aggregate(as.data.frame.matrix(dat2[,-1]),list(cluster=pam_fit$clustering),sum)
cluster depression flu headache infection inflammation pain
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0
或者 dplyr 解决方案:
as.data.frame.matrix(dat2[,-1]) %>%
mutate(cluster=pam_fit$clustering) %>%
group_by(cluster) %>%
summarize_all(sum)
# A tibble: 2 x 7
cluster depression flu headache infection inflammation pain
<int> <int> <int> <int> <int> <int> <int>
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0
如果我尝试 运行 下面的代码来获取聚类结果的摘要,我会收到以下错误:
Error in UseMethod("mutate_") : no applicable method for 'mutate_' applied to an object of class "table"
如果 dat_ 是数据框,此代码有效,但如果它是 table,则会收到上述错误消息。有没有人有办法解决吗?
pam_fit <- pam(gower_dist, diss = TRUE, k) # performs cluster analysis
pam_results <- dat %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.))
pam_results$the_summary
示例数据集:
set.seed(1)
dat <- data.frame(ID = rep(sample(c("a","b","c","d","e","f","g"),10,replace = TRUE),70),
disease = sample(c("flu","headache","pain","inflammation","depression","infection","chest pain"),100,replace = TRUE))
dat <- unique(dat)
dat2 <- table(dat)
dat3 <- as.data.frame(dat)
如果您查看 dat,每个 ID 都有多个观察值,并且您正试图根据其疾病列将 ID 划分为多个簇。所以你的聚类结果应该和你的id一样长,如果你想总结你的结果,你可以按聚类做。
要将表格放在一起,请执行:
library(cluster)
library(tidyverse)
pam_fit <- pam(daisy(dat2,"gower"), diss = TRUE, 2) # performs cluster analysis
pam_results <- as.data.frame.matrix(table(dat)) %>%
mutate(cluster = pam_fit$clustering) %>%
group_by(cluster) %>%
do(the_summary = summary(.),freq = colSums(.))
总结如下:
pam_results$freq
[[1]]
chest pain depression flu headache infection inflammation
4 5 4 3 5 3
pain cluster
5 5
[[2]]
chest pain depression flu headache infection inflammation
1 2 2 2 2 2
pain cluster
0 4
如果你只需要频率,你可以简单地做:
aggregate(as.data.frame.matrix(dat2[,-1]),list(cluster=pam_fit$clustering),sum)
cluster depression flu headache infection inflammation pain
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0
或者 dplyr 解决方案:
as.data.frame.matrix(dat2[,-1]) %>%
mutate(cluster=pam_fit$clustering) %>%
group_by(cluster) %>%
summarize_all(sum)
# A tibble: 2 x 7
cluster depression flu headache infection inflammation pain
<int> <int> <int> <int> <int> <int> <int>
1 1 5 4 3 5 3 5
2 2 2 2 2 2 2 0