基于层次聚类的ggplot2排序热图
ggplot2 reodering heatmap base on hierachical clustering
我在 ggplot2 上苦苦挣扎,尽管我发现了非常相似的问题,但我没能成功。我想根据分层聚类按列和行重新排序热图。
这里是我的实际代码:
# import
library("ggplot2")
library("scales")
library("reshape2")
# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')
# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))
# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)
# making a new data frame reordered
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df) # when mannualy looking new_df it seems working
# get the row name
name = as.factor(row.names(new_df))
# reshape
melte_df = melt(cbind(name, new_df))
# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])
# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue") + theme(text=element_text(size=12),
axis.text.y=element_text(size=3)))
# save fig
ggsave(file = "test.pdf")
# result is ordered as only by column what I have missed?
我是 R 的新手,如果您能提出您的答案,我们将不胜感激。
没有可重现的示例数据集,我不能 100% 确定是这个原因,但我猜你的问题出在这一行:
name = as.factor(row.names(new_df))
当您使用一个因子时,排序基于该因子水平的排序。您可以根据需要重新排序数据框,绘图时使用的顺序将是您的级别顺序。
这是一个例子:
data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
data_frame
x y
1 apple 50
2 banana 30
3 peach 70
data_frame$x <- as.factor(data_frame$x) # Make x column a factor
levels(data_frame$x) # This shows the levels of your factor
[1] "apple" "banana" "peach"
data_frame <- data_frame[order(data_frame$y),] # Order by value of y
data_frame
x y
2 banana 30
1 apple 50
3 peach 70
# Now let's plot it:
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p
这是结果:
看到了吗?它没有像我们想要的那样按 y 值排序。它按因素的水平排序。现在,如果那确实是您的问题所在,这里有解决方案 R - Order a factor based on value in one or more other columns。
dplyr解决方案的应用示例:
library(dplyr)
data_frame <- data_frame %>%
arrange(y) %>% # sort your dataframe
mutate(x = factor(x,x)) # reset your factor-column based on that order
data_frame
x y
1 banana 30
2 apple 50
3 peach 70
levels(data_frame$x) # Levels of the factor are reordered!
[1] "banana" "apple" "peach"
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p
这是现在的结果:
希望这对您有所帮助,否则,您可能需要提供原始数据集的示例!
我在 ggplot2 上苦苦挣扎,尽管我发现了非常相似的问题,但我没能成功。我想根据分层聚类按列和行重新排序热图。
这里是我的实际代码:
# import
library("ggplot2")
library("scales")
library("reshape2")
# data loading
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t')
# clustering with hclust on row and on column
dd.col <- as.dendrogram(hclust(dist(data_frame)))
dd.row <- as.dendrogram(hclust(dist(t(data_frame))))
# ordering based on clustering
col.ord <- order.dendrogram(dd.col)
row.ord <- order.dendrogram(dd.row)
# making a new data frame reordered
new_df = as.data.frame(data_frame[col.ord, row.ord])
print(new_df) # when mannualy looking new_df it seems working
# get the row name
name = as.factor(row.names(new_df))
# reshape
melte_df = melt(cbind(name, new_df))
# the solution is here to reorder the name column factors levels.
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)])
# ggplot2 dark magic
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value),
colour = "white") + scale_fill_gradient(low = "white",
high = "steelblue") + theme(text=element_text(size=12),
axis.text.y=element_text(size=3)))
# save fig
ggsave(file = "test.pdf")
# result is ordered as only by column what I have missed?
我是 R 的新手,如果您能提出您的答案,我们将不胜感激。
没有可重现的示例数据集,我不能 100% 确定是这个原因,但我猜你的问题出在这一行:
name = as.factor(row.names(new_df))
当您使用一个因子时,排序基于该因子水平的排序。您可以根据需要重新排序数据框,绘图时使用的顺序将是您的级别顺序。
这是一个例子:
data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70))
data_frame
x y
1 apple 50
2 banana 30
3 peach 70
data_frame$x <- as.factor(data_frame$x) # Make x column a factor
levels(data_frame$x) # This shows the levels of your factor
[1] "apple" "banana" "peach"
data_frame <- data_frame[order(data_frame$y),] # Order by value of y
data_frame
x y
2 banana 30
1 apple 50
3 peach 70
# Now let's plot it:
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p
这是结果:
看到了吗?它没有像我们想要的那样按 y 值排序。它按因素的水平排序。现在,如果那确实是您的问题所在,这里有解决方案 R - Order a factor based on value in one or more other columns。
dplyr解决方案的应用示例:
library(dplyr)
data_frame <- data_frame %>%
arrange(y) %>% # sort your dataframe
mutate(x = factor(x,x)) # reset your factor-column based on that order
data_frame
x y
1 banana 30
2 apple 50
3 peach 70
levels(data_frame$x) # Levels of the factor are reordered!
[1] "banana" "apple" "peach"
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y))
p
这是现在的结果:
希望这对您有所帮助,否则,您可能需要提供原始数据集的示例!