R中分组箱线图的平均值

Mean value on a grouped boxplots in R

我想根据我的数据创建分组箱线图:

X     variable value
Cat1  Var1     10
Cat2  Var1     8
Cat3  Var1     7
Cat4  Var1     15
Cat1  Var2     4
Cat2  Var2     3
Cat3  Var2     4
Cat4  Var2     1

我能够通过以下方式检索它:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))

现在我想添加额外的点,这些点将显示每个箱线图的平均值。我试过了:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))+
    geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18)

但是它在同一个X位置显示了4个值(下图中的红点)

我尝试使用 facet_wrap 但我无法更正错误:

ggplot() +
    geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
    ylim(c(-5, 15))+
    geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18) +
    facet_wrap(~means$`dataFiltered$variable`, scales='free')

Error in layout_base... At least one layer must contain all variables used for facetting.

有没有办法在分组的箱线图上取平均值?

尝试添加一个 stat_summary() 调用:

library(dplyr)
library(tidyr)
library(ggplot2)
df <- bind_rows(lapply(c(
  "Cat1  Var1     10",
  "Cat2  Var1     8",
  "Cat3  Var1     7",
  "Cat4  Var1     15",
  "Cat1  Var2     4",
  "Cat2  Var2     3",
  "Cat3  Var2     4",
  "Cat4  Var2     1"), data.frame))
colnames(df) <- "V1"
df2 <- df %>%
        separate(V1, c("X", "variable", "value"), sep="\s+") %>%
        mutate(value = as.integer(value))

ggplot(df2, aes(x=X, y=value, color=variable)) +
        geom_boxplot()+
        ylim(c(-5, 15)) + 
        stat_summary(geom = "point", fun.y = "mean", colour = "red", size = 4)

如果你想为每个组都使用它,试试这个:

ggplot(df2, aes(x=X, y=value, color=variable)) +
        geom_boxplot()+
        ylim(c(-5, 15)) +
        stat_summary(geom = "point", aes(group=variable, col=variable), 
            fun.y = "mean", size = 4, position=position_dodge(width=0.5))

当样本量较小时,这些图可能会产生误导。