如何使用 R 中的百分位数为两种类型的样本制作箱线图

How to make box plots for two types of samples using percentiles in R

我有一个如下所示的数据:

df <-data.frame(
  Group = c("1", "2", "3", "4"), 
  GOOD_0 = c(1L, 1L, 1L, 1L), 
  GOOD_25 = c(61.25, 1, 1, 1), 
  GOOD_50 = c(119, 1, 1, 1), 
  GOOD_75 = c(153, 1, 1, 1), 
  GOOD_100 = c(237L, 1L, 1L, 1L), 
  SALINE_0 = c(1L, 1L, 1L, 1L), 
  SALINE_25 = c(1, 40.25, 1, 22.5), 
  SALINE_50 = c(1, 86, 52.5, 122.5), 
  SALINE_75 = c(1, 136, 101.5, 269.25), 
  SALINE_100 = c(60L, 360L, 222L, 508L)
)

我想一个接一个地绘制 GOOD 和 SALINE 类型的箱线图(可能用两种不同的颜色)。 GOOD_ 和 SALINE_ 之后的数字表示它们的百分位数。如何在 R 中使用这些百分位数制作 Groups 的箱线图?

我可以像这样为 GOOD 类型做,但不能在同一地块中包含 SALINE 盒子

ggplot(df, aes(x=Group, ymin = GOOD_0, lower = GOOD_25, middle = GOOD_50, upper = GOOD_75, ymax = GOOD_100)) +
      geom_boxplot(stat = "identity")

如果您稍微转换一下数据,就可以轻松做到这一点。处理 ggplot 的最佳方法是将数据设为长格式。因此,将您的 dataframe 重新调整为这样,并添加一个列来标识它属于哪个组 SALINEGOOD

我假设你的 x 变量是 Group,因为 x 并不像你对 aes(x=x ...)

所做的那样存在于数据中
GOOD <- df %>% select(Group, starts_with("GOOD")) %>% rename(Percentile_0 = GOOD_0, 
                                                     Percentile_25 = GOOD_25, 
                                                     Percentile_50 = GOOD_50, 
                                                     Percentile_75 = GOOD_75, 
                                                     Percentile_100 = GOOD_100) 
SALINE <- df %>% select(Group, starts_with("SALINE")) %>% rename(Percentile_0 = SALINE_0, 
                                                       Percentile_25 = SALINE_25, 
                                                       Percentile_50 = SALINE_50, 
                                                       Percentile_75 = SALINE_75, 
                                                       Percentile_100 = SALINE_100) 


new_df <- bind_rows(GOOD %>% mutate(grp = "GOOD"), SALINE %>% mutate(grp = "SALINE"))

new_df
# A tibble: 8 x 7
  Group Percentile_0 Percentile_25 Percentile_50 Percentile_75 Percentile_100 grp   
  <fct>        <int>         <dbl>         <dbl>         <dbl>          <int> <chr> 
1 1                1          61.2         119            153             237 GOOD  
2 2                1           1             1              1               1 GOOD  
3 3                1           1             1              1               1 GOOD  
4 4                1           1             1              1               1 GOOD  
5 1                1           1             1              1              60 SALINE
6 2                1          40.2          86            136             360 SALINE
7 3                1           1            52.5          102.            222 SALINE
8 4                1          22.5         122.           269.            508 SALINE

现在有几种方法可以完成我上面所做的事情。但是一旦完成,绘制两者就非常简单,如果您指定 colour 美学,ggplot 将为您创建一个图例。因此,

new_df %>% ggplot(aes(x = Group, group = grp, colour = grp)) +
           geom_boxplot(stat = "identity", 
                        aes(ymin = Percentile_0, lower = Percentile_25, middle = Percentile_50, upper = Percentile_75, ymax = Percentile_100))

最终数据框

structure(list(Group = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L), .Label = c("1", "2", "3", "4"), class = "factor"), Percentile_0 = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), Percentile_25 = c(61.25, 1, 1, 1, 
1, 40.25, 1, 22.5), Percentile_50 = c(119, 1, 1, 1, 1, 86, 52.5, 
122.5), Percentile_75 = c(153, 1, 1, 1, 1, 136, 101.5, 269.25
), Percentile_100 = c(237L, 1L, 1L, 1L, 60L, 360L, 222L, 508L
), grp = c("GOOD", "GOOD", "GOOD", "GOOD", "SALINE", "SALINE", 
"SALINE", "SALINE")), row.names = c(NA, -8L), class = "data.frame")