基于层次聚类的ggplot2排序热图

ggplot2 reodering heatmap base on hierachical clustering

我在 ggplot2 上苦苦挣扎,尽管我发现了非常相似的问题,但我没能成功。我想根据分层聚类按列和行重新排序热图。

这里是我的实际代码:

# import
library("ggplot2")
library("scales")
library("reshape2")

# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')

# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))

# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df)   # when mannualy looking new_df it seems working 

# get the row name
name = as.factor(row.names(new_df))

# reshape
melte_df = melt(cbind(name, new_df))

# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])

# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
 colour = "white") + scale_fill_gradient(low = "white",
 high = "steelblue") + theme(text=element_text(size=12),
 axis.text.y=element_text(size=3)))

# save fig
ggsave(file = "test.pdf")

# result is ordered as only by column what I have missed?

我是 R 的新手,如果您能提出您的答案,我们将不胜感激。

没有可重现的示例数据集,我不能 100% 确定是这个原因,但我猜你的问题出在这一行:

name = as.factor(row.names(new_df))

当您使用一个因子时,排序基于该因子水平的排序。您可以根据需要重新排序数据框,绘图时使用的顺序将是您的级别顺序。

这是一个例子:

data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
data_frame
       x  y
1  apple 50
2 banana 30
3  peach 70

data_frame$x <- as.factor(data_frame$x) # Make x column a factor

levels(data_frame$x) # This shows the levels of your factor
[1] "apple"  "banana" "peach" 

data_frame <- data_frame[order(data_frame$y),] # Order by value of y
data_frame
   x  y
2 banana 30
1  apple 50
3  peach 70

# Now let's plot it:
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

这是结果:

看到了吗?它没有像我们想要的那样按 y 值排序。它按因素的水平排序。现在,如果那确实是您的问题所在,这里有解决方案 R - Order a factor based on value in one or more other columns

dplyr解决方案的应用示例:

library(dplyr)
data_frame <- data_frame %>%
       arrange(y) %>%          # sort your dataframe
       mutate(x = factor(x,x)) # reset your factor-column based on that order

data_frame
       x  y
1 banana 30
2  apple 50
3  peach 70

levels(data_frame$x) # Levels of the factor are reordered!
[1] "banana" "apple"  "peach" 

p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p

这是现在的结果:

希望这对您有所帮助,否则,您可能需要提供原始数据集的示例!