ggplot scale_continuous 如何与 stat_summary 结合使用

Question

我有一些数据看起来像这样（实际上还有更多）：

size program group percent
1    prog1   1     50
2    prog1   1     0.1
1    prog1   2     75
2    prog1   2     1 
1    prog2   1     55
2    prog2   1     2
1    prog2   2     70
2    prog2   2     4

我想用这样的东西来绘制它：

plot1 <- ggplot(tbl, aes(size, percent, group=group, color=group))+
         geom_point()+
         stat_summary(fun.y=gm_mean, geom='line')+
         scale_x_continuous(trans=log2_trans())+
         scale_y_continuous(trans=log2_trans())

gm_mean <- function(x) {
 exp(mean(log(x)))

}

如果我尝试运行这样做，我会收到此警告：在 loop_apply(n, do.ply) 中：产生了 NaN 我打印出我在 gm_mean 中得到的值，我发现它们不是我期望的实际值，但它们看起来像这些值的 log2。（0.1 变成 -3.3，然后我认为 log() 导致 NaN）这是否意味着使用 fun.y=mean 实际上会在使用 scale_y_continuous(trans=log2_trans()) 时计算几何平均值？如果不是，您将如何获得几何平均值，如果是，如果我需要，您将如何获得对数刻度的实际平均值？

我想做的是绘制一种散点图，然后绘制 2 条几何平均线（每组一条），但在 log2 尺度上。

Answer 1

Does this mean that using fun.y=mean would actually calculate the geometric mean when using scale_y_continuous(trans=log2_trans())?

是的，我认为确实是这个意思。以下是一些带有试验数据的示例：

#packages
require(ggplot2)
require(scales)

#gm_mean function:
gm_mean <- function(x){exp(mean(log(x)))}

#trial data
df <- data.frame(x=sample(1:5, size=100, replace=T), 
                 group=factor(sample(c(1,2,3), size=100, replace=T)))

df$y <- df$x*as.numeric(as.character(df$group))+rnorm(100)+1

#create one outlier for easier visual differences between geomean and arithmean
df[df$x==1&df$group==1,][1,'y'] <- 30

#create base plots
d <- qplot(data=df, x=x, y=y, group=group, color=group) + theme_bw()
d2 <- d + scale_y_continuous(trans=log2_trans()) +
          scale_x_continuous(trans=log2_trans())

#comparing different plots
quartz(width=4, height=4)
d + stat_summary(fun.y=mean, geom='line') + labs(title='untransformed arithmean')
quartz(width=4, height=4)
d + stat_summary(fun.y=gm_mean, geom='line') + labs(title='untransformed geomean')
quartz(width=4, height=4)
d2 + stat_summary(fun.y=mean, geom='line') + labs(title='transformed arithmean')
quartz(width=4, height=4)
d2 + stat_summary(fun.y=gm_mean, geom='line') + labs(title='transformed geomean')

观察：

在具有算术平均值的转换图上，组 1 的线与具有几何平均值的未转换图的线相同。 所以，你是对的。
相反，未经转换的具有算术平均值的同一条线被人为异常值扭曲得更多。
几何平均数变换后的图上的相同线条没有意义，会遇到很多NaN问题。您不应将几何均值与对数转换轴一起使用。

另一个想法：您确定要 stat_summary() 而不是 stat_smooth() 吗？你的数据在 x 轴上只有两个点，所以我不能确定，但在我看来和示例数据中，这些图提供了更多信息：

quartz(width=4, height=4)
d2 + stat_smooth(method='lm', formula=y~x, se=F) + 
     labs(title='stat_smooth transformed')
quartz(width=4, height=4)
d + stat_smooth(method='lm', formula=y~I(2^log(x,2)), se=F) + 
     labs(title='stat_smooth untransformed')

ggplot scale_continuous 如何与 stat_summary 结合使用

how does ggplot scale_continuous work combined with stat_summary

r

ggplot2