使用 ggplot2 基于两个因子变量(在 x 轴上)排序箱线图

ordering boxplots based on two factor variables (in x-axis) with ggplot2

我有一个包含 3 个变量的大型数据框:symbols、vaf、Gene.function。 (link 到 df:https://www.dropbox.com/s/y6ykbzuy8x19psp/df_SO.txt?dl=0)。

 dim(df)
[1] 2021    3

我正在尝试创建一个包含多个箱线图的图形,并根据变量“Gene.function”和 x 轴为“符号”来排列它们。 我不关心 x 轴上的顺序,但我确实希望所有具有相同类别的基因(符号)一个接一个,就像这里的例子一样:

我最接近实现目标的方法是使用 forcast 库,但出于某种原因,并非所有具有相同“Gene.function”的基因都被排序在一起。这是我使用的代码:

B <- df %>%
  mutate(symbol = fct_reorder(symbol, Gene.function)) %>%
  ggplot(aes(x = factor(symbol), y = vaf, fill = factor(Gene.function), color = factor(Gene.function))) +
  geom_boxplot() +
  scale_y_continuous(labels = function(x) paste0(x * 100, '%')) +
  xlab('') + 
  ylab('') + 
  ggtitle ('VAF distribution')+
  guides(fill = 'none')+
  theme_classic() + 
  theme(legend.position = "right",
        axis.text.x = element_text(angle = 90, size = 10, hj = 0.5, vj = 0.5, color = "black"),
        axis.text.y = element_text(size = 8, color = "black"),
        axis.title = element_text(size = 12), 
        plot.title = element_text(size = 14, face = 'italic'))

我认为问题在于我使用的变量 none 是数字,相反,两者都是因子(符号和 Gene.functions)。事实上,当 运行 上面的代码时,我收到以下警告:

There were 24 warnings (use warnings() to see them)
Warning messages:
1: Problem with `mutate()` input `symbol`.
i argument is not numeric or logical: returning NA
i Input `symbol` is `fct_reorder(symbol, Gene.function)`.
2: Problem with `mutate()` input `symbol`.
i argument is not numeric or logical: returning NA
i Input `symbol` is `fct_reorder(symbol, Gene.function)`.
3: Problem with `mutate()` input `symbol`.
i argument is not numeric or logical: returning NA
i Input `symbol` is `fct_reorder(symbol, Gene.function)`.
4: Problem with `mutate()` input `symbol`.
i argument is not numeric or logical: returning NA
i Input `symbol` is `fct_reorder(symbol, Gene.function)`.
(...)

有人可以给我提示吗?非常感谢!

需要先根据Gene.functionsymboldf进行排序,然后根据symbol的排序信息进行制作正确的因子水平顺序:

library(ggplot2)
library(dplyr)

level_info <- df %>%
  arrange(Gene.function, symbol) %>% 
  pull(symbol) %>% 
  unique()

df %>%
  mutate(Gene.function = as.factor(Gene.function),
         symbol = factor(symbol, levels = level_info)) %>%
  ggplot(aes(x = symbol, y = vaf, fill = Gene.function, color = Gene.function)) +
  geom_boxplot() +
  scale_y_continuous(labels = function(x) paste0(x * 100, '%')) +
  xlab('') + 
  ylab('') + 
  ggtitle ('VAF distribution')+
  guides(fill = 'none')+
  theme_classic() + 
  theme(legend.position = "right",
        axis.text.x = element_text(angle = 90, size = 10, hj = 0.5, vj = 0.5, color = "black"),
        axis.text.y = element_text(size = 8, color = "black"),
        axis.title = element_text(size = 12), 
        plot.title = element_text(size = 14, face = 'italic'))