使用 geom_boxplot 指定分位数和长数据的有效方法

Efficient way to use geom_boxplot with specified quantiles and long data

我有一个数据集,其中包含每个部门和国家/地区的计算分位数。它看起来像这样:

df <- structure(list(quantile = c("p5", "p25", "p50", "p75", "p95", 
"p5", "p25", "p50", "p75", "p95", "p5", "p25", "p50", "p75", 
"p95", "p5", "p25", "p50", "p75", "p95"), value = c(6, 12, 20, 
33, 61, 6, 14, 23, 38, 63, 7, 12, 17, 26, 50, 7, 12, 18, 26, 
51), country = c("A", "A", "A", "A", "A", "B", "B", "B", "B", 
"B", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B"), dep = c("D", 
"D", "D", "D", "D", "D", "D", "D", "D", "D", "I", "I", "I", "I", 
"I", "I", "I", "I", "I", "I"), kpi = c("F", "F", "F", "F", "F", 
"F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", 
"F", "F")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame"))

现在,我想为每个比较国家的部门构建一个箱线图,并使用 p5/p95 而不是 min/max 类似于此图但没有异常值(因此, Train_number 将是countries):

该图对应的代码是(来自问题ggplot2, geom_boxplot with custom quantiles and outliers):

ggplot(MyData, aes(factor(Stations), Arrival_Lateness, 
                   fill = factor(Train_number))) + 
  stat_summary(fun.data = f, geom="boxplot", 
               position=position_dodge(1))+
  stat_summary(aes(color=factor(Train_number)),fun.y = q, geom="point", 
               position=position_dodge(1))

我试图从上面的代码和提供的答案中得出一个解决方案。不幸的是,我不知道如何从变量 quantilevalueggplot() 提供必要的值。 stat_summary() 函数中是否有我错过但可以使用的参数?或者只是另一个简单的解决方案?

无论您提供什么数据,您都可以生成以下图表

library(ggplot2)

f <- function(x) {
  r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
  names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
  r
}

ggplot(df, aes(factor(dep), value)) + 
  stat_summary(fun.data = f, geom="boxplot", 
               position=position_dodge(1))+
    facet_grid(.~country, scales="free")

不知道对不对