GGPLOT 箱线图按颜色细分,均值位于箱线图中间
GGPLOT box plot subdivided by color with means in middle of boxplot
我有两个分类变量的数据。我可以将这些绘制成箱线图,但无法在正确的位置显示。
我已经在 iris 数据集中创建了效果(红色矩形是手动添加的,而不是在 ggplot 中)。
Iris <- iris %>%
mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))
means <- Iris %>%
group_by(Species, SepalLengthType) %>%
summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
geom_boxplot()
现在我想为每个箱形图添加均值
下面的这些线都有效,但平均值不以箱线图为中心,而是以 SepelLengthType 类别为中心。
plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)
如何在每个箱线图的中间显示均值?
我很感激我可以重新排列数据,这样每组数据点都在它自己的列中,但由于它们的长度不尽相同,这需要它自己的解决方法。
当我使用 fun = "mean" 时,我收到一条警告消息“已删除包含缺失值的 5 行 (geom_segment)。”这是为什么? 'means' 行没有这个问题,但我不想自己计算均值。
你可以像下面的代码一样使用position=position_dodge(0.9)
library(tidyverse)
Iris <- iris %>%
mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))
means <- Iris %>%
group_by(Species, SepalLengthType) %>%
summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
geom_boxplot(position=position_dodge(0.9))
plot + geom_point(data = means, aes(color = Species), shape = 15,
position = position_dodge2(width = 0.9))
或使用stat_summary
作为
plot + stat_summary(fun = "mean", aes(group = Species), shape = 15,
position = position_dodge2(width = 0.9))
我有两个分类变量的数据。我可以将这些绘制成箱线图,但无法在正确的位置显示。 我已经在 iris 数据集中创建了效果(红色矩形是手动添加的,而不是在 ggplot 中)。
Iris <- iris %>%
mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))
means <- Iris %>%
group_by(Species, SepalLengthType) %>%
summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
geom_boxplot()
现在我想为每个箱形图添加均值 下面的这些线都有效,但平均值不以箱线图为中心,而是以 SepelLengthType 类别为中心。
plot + stat_summary(fun = "mean" , aes(color = Species), shape = 15)
plot + stat_summary(fun = "mean" , aes(group = Species), shape = 15)
plot + stat_summary(fun.y = "mean", shape = 15) # this works, but is deprecated
plot + geom_point(data = means, aes(color = Species), shape = 15)
如何在每个箱线图的中间显示均值? 我很感激我可以重新排列数据,这样每组数据点都在它自己的列中,但由于它们的长度不尽相同,这需要它自己的解决方法。
当我使用 fun = "mean" 时,我收到一条警告消息“已删除包含缺失值的 5 行 (geom_segment)。”这是为什么? 'means' 行没有这个问题,但我不想自己计算均值。
你可以像下面的代码一样使用position=position_dodge(0.9)
library(tidyverse)
Iris <- iris %>%
mutate(SepalLengthType = ifelse(Sepal.Length > 5.8, "high", "low"))
means <- Iris %>%
group_by(Species, SepalLengthType) %>%
summarise(Sepal.Width = mean(Sepal.Width), .groups = "keep")
plot <- ggplot(data = Iris, aes(y=Sepal.Width, x = SepalLengthType, colour = Species))+
geom_boxplot(position=position_dodge(0.9))
plot + geom_point(data = means, aes(color = Species), shape = 15,
position = position_dodge2(width = 0.9))
或使用stat_summary
作为
plot + stat_summary(fun = "mean", aes(group = Species), shape = 15,
position = position_dodge2(width = 0.9))