GGPLOT 箱线图按颜色细分,均值位于箱线图中间

GGPLOT box plot subdivided by color with means in middle of boxplot

我有两个分类变量的数据。我可以将这些绘制成箱线图,但无法在正确的位置显示。 我已经在 iris 数据集中创建了效果(红色矩形是手动添加的,而不是在 ggplot 中)。

Iris <- iris %>%
        mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
        group_by(Species, SepalLengthType) %>% 
        summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
        geom_boxplot()

现在我想为每个箱形图添加均值 下面的这些线都有效,但平均值不以箱线图为中心,而是以 SepelLengthType 类别为中心。

plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)

如何在每个箱线图的中间显示均值? 我很感激我可以重新排列数据,这样每组数据点都在它自己的列中,但由于它们的长度不尽相同,这需要它自己的解决方法。

当我使用 fun = "mean" 时,我收到一条警告消息“已删除包含缺失值的 5 行 (geom_segment)。”这是为什么? 'means' 行没有这个问题,但我不想自己计算均值。

你可以像下面的代码一样使用position=position_dodge(0.9)

library(tidyverse)

Iris <- iris %>%
  mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))

means <- Iris %>% 
  group_by(Species, SepalLengthType) %>% 
  summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")

plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
  geom_boxplot(position=position_dodge(0.9))

plot + geom_point(data = means, aes(color = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))

或使用stat_summary作为

plot + stat_summary(fun = "mean", aes(group = Species), shape = 15, 
                  position = position_dodge2(width = 0.9))