R中分组箱线图的平均值
Mean value on a grouped boxplots in R
我想根据我的数据创建分组箱线图:
X variable value
Cat1 Var1 10
Cat2 Var1 8
Cat3 Var1 7
Cat4 Var1 15
Cat1 Var2 4
Cat2 Var2 3
Cat3 Var2 4
Cat4 Var2 1
我能够通过以下方式检索它:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))
现在我想添加额外的点,这些点将显示每个箱线图的平均值。我试过了:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))+
geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18)
但是它在同一个X位置显示了4个值(下图中的红点)
我尝试使用 facet_wrap 但我无法更正错误:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))+
geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18) +
facet_wrap(~means$`dataFiltered$variable`, scales='free')
Error in layout_base... At least one layer must contain all variables used for facetting.
有没有办法在分组的箱线图上取平均值?
尝试添加一个 stat_summary()
调用:
library(dplyr)
library(tidyr)
library(ggplot2)
df <- bind_rows(lapply(c(
"Cat1 Var1 10",
"Cat2 Var1 8",
"Cat3 Var1 7",
"Cat4 Var1 15",
"Cat1 Var2 4",
"Cat2 Var2 3",
"Cat3 Var2 4",
"Cat4 Var2 1"), data.frame))
colnames(df) <- "V1"
df2 <- df %>%
separate(V1, c("X", "variable", "value"), sep="\s+") %>%
mutate(value = as.integer(value))
ggplot(df2, aes(x=X, y=value, color=variable)) +
geom_boxplot()+
ylim(c(-5, 15)) +
stat_summary(geom = "point", fun.y = "mean", colour = "red", size = 4)
如果你想为每个组都使用它,试试这个:
ggplot(df2, aes(x=X, y=value, color=variable)) +
geom_boxplot()+
ylim(c(-5, 15)) +
stat_summary(geom = "point", aes(group=variable, col=variable),
fun.y = "mean", size = 4, position=position_dodge(width=0.5))
当样本量较小时,这些图可能会产生误导。
我想根据我的数据创建分组箱线图:
X variable value
Cat1 Var1 10
Cat2 Var1 8
Cat3 Var1 7
Cat4 Var1 15
Cat1 Var2 4
Cat2 Var2 3
Cat3 Var2 4
Cat4 Var2 1
我能够通过以下方式检索它:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))
现在我想添加额外的点,这些点将显示每个箱线图的平均值。我试过了:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))+
geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18)
但是它在同一个X位置显示了4个值(下图中的红点)
我尝试使用 facet_wrap 但我无法更正错误:
ggplot() +
geom_boxplot(aes(x=dataFiltered$X, y=dataFiltered$value, color=dataFiltered$variable))+
ylim(c(-5, 15))+
geom_point(stat="identity", aes(x=means$`dataFiltered$X`, y=means$`dataFiltered$value`), col = "red",pch=18) +
facet_wrap(~means$`dataFiltered$variable`, scales='free')
Error in layout_base... At least one layer must contain all variables used for facetting.
有没有办法在分组的箱线图上取平均值?
尝试添加一个 stat_summary()
调用:
library(dplyr)
library(tidyr)
library(ggplot2)
df <- bind_rows(lapply(c(
"Cat1 Var1 10",
"Cat2 Var1 8",
"Cat3 Var1 7",
"Cat4 Var1 15",
"Cat1 Var2 4",
"Cat2 Var2 3",
"Cat3 Var2 4",
"Cat4 Var2 1"), data.frame))
colnames(df) <- "V1"
df2 <- df %>%
separate(V1, c("X", "variable", "value"), sep="\s+") %>%
mutate(value = as.integer(value))
ggplot(df2, aes(x=X, y=value, color=variable)) +
geom_boxplot()+
ylim(c(-5, 15)) +
stat_summary(geom = "point", fun.y = "mean", colour = "red", size = 4)
如果你想为每个组都使用它,试试这个:
ggplot(df2, aes(x=X, y=value, color=variable)) +
geom_boxplot()+
ylim(c(-5, 15)) +
stat_summary(geom = "point", aes(group=variable, col=variable),
fun.y = "mean", size = 4, position=position_dodge(width=0.5))
当样本量较小时,这些图可能会产生误导。