使用 ggplot2 绘制箱线图

Boxplot with ggplot2

我正在绘制一个包含预测和观察结果的箱线图,这是一个相当长的数据集。我在这里提供了一个示例格式。

> forecasts <- data.frame(f_type = c(rep("A", 9), rep("B", 9)), 
                          Date = c(rep(as.Date("2007-01-31"),3), rep(as.Date("2007-02-28"), 3), rep(as.Date("2007-03-31"), 3), rep(as.Date("2007-01-31"), 3), rep(as.Date("2007-02-28"), 3), rep(as.Date("2007-03-31"), 3)), 
                          value = c(10, 50, 60, 05, 90, 20, 30, 46, 39, 69, 82, 48, 65, 99, 75, 15 ,49, 27))
> 
> observation <- data.frame(Dt = c(as.Date("2007-01-31"), as.Date("2007-02-28"), as.Date("2007-03-31")), 
                            obs = c(30,49,57))

到目前为止我有:

ggplot() + 
    geom_boxplot(data = forecasts,
                 aes(x = as.factor(Date), y = value, 
                     group = interaction(Date, f_type), fill = f_type)) +  
    geom_line(data = observations,
              aes(x = as.factor(Dt), y = obs, group = 1), 
              size = 2)

默认情况下设置框和胡须。我想分配这些值,以便了解胡须的范围。我试图用 stat_summary 传递一个函数,比如:

f <- function(x) {
    r <- quantile(x, probs = c(0.05, 0.25, 0.5, 0.75, 0.95))
    names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
    r
}

o <- function(x) {
    subset(x, x < quantile(x,probs = 0.05) | quantile(x,probs = 0.95) < x)
}

ggplot(forecasts, aes(x = as.factor(Date), y = value)) + 
    stat_summary(fun.data = f, geom = "boxplot", aes(group = interaction(Date, f_type), fill = f_type)) +
    stat_summary(fun.y = o, geom = "point") 

但是,这样一来,群就乱了。这会产生堆积图。 有没有人如何完成这个?

通过一些预处理,您可以按日期和 f_type 汇总值以生成所需的 yminlowermiddleupperymax geom_boxplot 的参数(诀窍是设置 stat = "identity"):

forecasts %>% group_by(f_type, Date) %>% 
    summarise( # You can set your desired values/quantiles here
        y_min = quantile(value, 0.05),
        low = quantile(value, 0.25),
        mid = quantile(value, 0.5),
        high = quantile(value, 0.75),
        y_max = quantile(value, 0.95)
    ) %>% 
    ggplot() + 
    geom_boxplot(
        aes(
            ymin = y_min,
            lower = low,
            middle = mid,
            upper = high,
            ymax = y_max,
            x = as.factor(Date),
            fill = f_type
        ), 
        stat = "identity"
    ) + 
    geom_line(
        data = observations,
        aes(
            x = as.factor(Dt), 
            y = obs, group = 1
        ), 
        size = 2
    )