如何用预测区间绘制多个预测估计值（线性回归）

Question

我有一个数据框，其中包含两个分类变量（二进制）的预测和预测区间，我想将它们绘制在一个图中。

数据帧示例 (df)：

  block condition response      fit      lwr      upr
1     1    reward yes        3388.629 2089.910 4687.348
2     2    reward yes        3372.682 2074.191 4671.173
....

选项有：奖励+是，奖励+否，不奖励+是，不奖励+否。

我想得到这种类型的图表，但包括所有选项（我认为是四行）：

我尝试更改以下代码但没有成功：

library("ggplot2")
p <- ggplot(df, aes(?, ?)) +
  geom_point() +
  stat_smooth(method = lm)
# 3. Add prediction intervals
p + geom_line(aes(y = lwr), color = "red", linetype = "dashed")+
    geom_line(aes(y = upr), color = "red", linetype = "dashed")

我想使用 ggplot，但我不知道如何在其中获取所有 4 行。任何帮助都会很棒！

Answer 1

您在问题中包含的图是两个连续变量相互绘制的图，因此您不能将这种类型的图用于您的数据。你说你有两个分类变量，但实际上你有三个：block、condition 和 response.

如果要绘制两个分类变量的上限值、下限值和拟合值，在 ggplot 中执行此操作的自然方法是使用 geom_errorbar，其中一个分类变量位于 x 轴上，另一个表示通过条形的颜色，它们在每个 x 轴位置被“闪避”，因此它们彼此分开。

对于如何处理不同的块，您有多种选择。如果你有一个小数字，那么你可以把每个块放在它自己的方面。如果您有大量数据，则可能需要聚合数据以对块进行平均。

你提供的两行数据框不足以说明哪一个更好，所以让我们组成一个类似的数据集。这里我们假设有 4 个块：

set.seed(1)

df <- data.frame(block     = rep(1:4, 4), 
                 condition = rep(rep(c("reward", "no reward"), each = 4), 2),
                 response  = rep(c("yes", "no"), each = 8),
                 fit       = rep(c(3400, 3200, 3300, 2900), each = 4) + 
                             rnorm(16, 0, 10),
                 lwr       = rep(c(2080, 1900, 2000, 1600), each = 4) + 
                             rnorm(16, 0, 10),
                 upr       = rep(c(4700, 4500, 4600, 4200), each = 4) + 
                             rnorm(16, 0, 10))

df
#>    block condition response      fit      lwr      upr
#> 1      1    reward      yes 3393.735 2079.838 4703.877
#> 2      2    reward      yes 3401.836 2089.438 4699.462
#> 3      3    reward      yes 3391.644 2088.212 4686.229
#> 4      4    reward      yes 3415.953 2085.939 4695.850
#> 5      1 no reward      yes 3203.295 1909.190 4496.057
#> 6      2 no reward      yes 3191.795 1907.821 4499.407
#> 7      3 no reward      yes 3204.874 1900.746 4511.000
#> 8      4 no reward      yes 3207.383 1880.106 4507.632
#> 9      1    reward       no 3305.758 2006.198 4598.355
#> 10     2    reward       no 3296.946 1999.439 4597.466
#> 11     3    reward       no 3315.118 1998.442 4606.970
#> 12     4    reward       no 3303.898 1985.292 4605.567
#> 13     1 no reward       no 2893.788 1595.218 4193.112
#> 14     2 no reward       no 2877.853 1604.179 4192.925
#> 15     3 no reward       no 2911.249 1613.587 4203.646
#> 16     4 no reward       no 2899.551 1598.972 4207.685

所以我们有与您自己的数据相同的列名和顺序，并且数字列的值大致相似。

绘制此图的明显方法是 ggplot 是：

library(ggplot2)

ggplot(within(df, block <- paste("Block", block)),
       aes(condition, fit, color = response, group = response)) +
  geom_errorbar(aes(min = lwr, max = upr), size = 1.5, 
                width = 0.25, position = position_dodge()) +
  geom_point(position = position_dodge(width = 0.25), color = "black") +
  facet_wrap(.~block, nrow = 2) +
  theme_bw()

如果你想聚合块，你可以获得一个面板：

library(ggplot2)
library(dplyr)

df %>%
  group_by(condition, response) %>%
  summarise(across(c("fit", "lwr", "upr"), mean)) %>%
  ggplot(aes(condition, fit, color = response, group = response)) +
  geom_errorbar(aes(min = lwr, max = upr), size = 1.5, 
                width = 0.25, position = position_dodge()) +
  geom_point(position = position_dodge(width = 0.25), color = "black") +
  theme_bw()

编辑

另一种方法是通过 condition:

在 x 轴和小平面上绘制块

ggplot(within(df, block <- paste("Block", block)),
       aes(block, fit, color = response, group = response)) +
  geom_errorbar(aes(min = lwr, max = upr), size = 1.5, 
                width = 0.25, position = position_dodge()) +
  geom_point(position = position_dodge(width = 0.25), color = "black") +
  facet_grid(condition~.) +
  theme_bw()

如何用预测区间绘制多个预测估计值（线性回归）

How to plot multiple prediction estimates with prediction intervals (linear regression)

r

prediction

intervals

linear-regression

ggplot2