为什么将 emmeans contrasts 转换为 data.frame 不报告正确的 p 值？

Question

我运行的对比的 p 值未正确转换为 data.frame。为什么会这样，我该如何解决？

emmeans 的控制台输出：

> pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
$`simple contrasts for Status`
Stim = 1, Treatment = None:
 contrast               estimate     SE   df t.ratio p.value
 Control - Subclinical  -0.24213 0.0571 57.5 -4.241  0.0002 
 Control - Clinical     -0.16275 0.0571 57.5 -2.851  0.0164 
 Subclinical - Clinical  0.07938 0.0571 57.5  1.390  0.3526

emmeans data.frame 的控制台输出：

> mod.EMM <- pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
> as.data.frame(mod.EMM)
   Stim Treatment      Status               contrast     estimate         SE       df      t.ratio      p.value
1    1      None           .  Control - Subclinical -0.242125000 0.05709000 57.46544  -4.24111052 3.680551e-03
2    1      None           .     Control - Clinical -0.162750000 0.05709000 57.46544  -2.85076195 2.721389e-01
3    1      None           . Subclinical - Clinical  0.079375000 0.05709000 57.46544   1.39034857 1.000000e+00

可重现的例子：

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)

library(emmeans)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
# 
# Treatment = chilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
# 
# 
# $`simple contrasts for Treatment`
# Type = Quebec:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
# 
# Type = Mississippi:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled    10.14 1.85 79 5.477   <.0001

as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
#    Treatment        Type             contrast  estimate       SE df  t.ratio      p.value
# 1 nonchilled           . Quebec - Mississippi  9.380952 1.851185 79 5.067538 1.036140e-05
# 2    chilled           . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3          .      Quebec nonchilled - chilled  3.580952 1.851185 79 1.934410 2.265719e-01
# 4          . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
# 
# Treatment = chilled:
#   contrast             estimate   SE df t.ratio p.value
# Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
# 
# 
# $`simple contrasts for Treatment`
# Type = Quebec:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
# 
# Type = Mississippi:
#   contrast             estimate   SE df t.ratio p.value
# nonchilled - chilled    10.14 1.85 79 5.477   <.0001

as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
#    Treatment        Type             contrast  estimate       SE df  t.ratio      p.value
# 1 nonchilled           . Quebec - Mississippi  9.380952 1.851185 79 5.067538 1.036140e-05
# 2    chilled           . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3          .      Quebec nonchilled - chilled  3.580952 1.851185 79 1.934410 2.265719e-01
# 4          . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06

来自外部帮助的更新：

“似乎 pairs() 的结果本身并不是一个可以转换为数据框的 emmGrid 对象，而是一个包含两个 emmGrid 对象的列表。如果您从列表中按位置提取这些对象中的任何一个，使用 [[]]，像这样，

pairs(emmeans(model1, ~ Type*Treatment), simple = "each")[[2]]

然后你可以 data.frame() 每个结果，它都是正确的。您最终会得到两个不同的数据帧来保存涉及两个不同变量的对比，但这些数据帧中的每一个都有正确的 p 值。"

我希望有人能更好地解决这个问题，这样我就可以将所有的对比组合成一个 data.frame。

Answer 1

您看到的不同 p 值反映了未调整的 p 值与针对多重比较调整的 p 值。

?emmeans::pairs 文档告诉我们：

Ordinarily, when simple is a list or "each", the return value is an emm_list object with each entry in correspondence with the entries of simple. However, with combine = TRUE, the elements are all combined into one family of contrasts in a single emmGrid object using rbind.emmGrid.. In that case, the adjust argument sets the adjustment method for the combined set of contrasts.

因此，对于您的可重现示例，您可以将所有简单的主效应组合到一个数据框中，并将 combine 参数设置为 TRUE。您可以通过设置 adjust 参数在未调整和调整后的 p 值之间进行选择。

model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)

> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+               adjust = "none")
 Treatment  Type        contrast             estimate   SE df t.ratio p.value
 nonchilled .           Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
 chilled    .           Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
 .          Quebec      nonchilled - chilled     3.58 1.85 79 1.934   0.0566 
 .          Mississippi nonchilled - chilled    10.14 1.85 79 5.477   <.0001

这是一个经过 Bonferroni 调整的：

> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+               adjust = "bonferroni")
 Treatment  Type        contrast             estimate   SE df t.ratio p.value
 nonchilled .           Quebec - Mississippi     9.38 1.85 79 5.068   <.0001 
 chilled    .           Quebec - Mississippi    15.94 1.85 79 8.610   <.0001 
 .          Quebec      nonchilled - chilled     3.58 1.85 79 1.934   0.2266 
 .          Mississippi nonchilled - chilled    10.14 1.85 79 5.477   <.0001 

P value adjustment: bonferroni method for 4 tests

Answer 2

这可以很容易地完成，但您要做的是获取基本输出，然后插入正确的 P 值。为了说明，我将展示一个不同的例子，其中一个因素有两个以上的水平。

require(emmeans)
#> Loading required package: emmeans

warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
(cons = pairs(emmeans(warp.lm, ~ wool * tension), simple = "each"))
#> $`simple contrasts for wool`
#> tension = L:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B       16.33 5.16 48  3.167  0.0027 
#> 
#> tension = M:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B       -4.78 5.16 48 -0.926  0.3589 
#> 
#> tension = H:
#>  contrast estimate   SE df t.ratio p.value
#>  A - B        5.78 5.16 48  1.120  0.2682 
#> 
#> 
#> $`simple contrasts for tension`
#> wool = A:
#>  contrast estimate   SE df t.ratio p.value
#>  L - M      20.556 5.16 48  3.986  0.0007 
#>  L - H      20.000 5.16 48  3.878  0.0009 
#>  M - H      -0.556 5.16 48 -0.108  0.9936 
#> 
#> wool = B:
#>  contrast estimate   SE df t.ratio p.value
#>  L - M      -0.556 5.16 48 -0.108  0.9936 
#>  L - H       9.444 5.16 48  1.831  0.1704 
#>  M - H      10.000 5.16 48  1.939  0.1389 
#> 
#> P value adjustment: tukey method for comparing a family of 3 estimates

# get the estimates, etc. into a data frame:
df = as.data.frame(cons)

# get the Tukey-adjusted P values:
pv = unlist(lapply(unlist(cons), function(x) as.data.frame(x)$p.value))

# replace the p values and display
df$p.value = pv
df
#>   tension wool contrast   estimate       SE df    t.ratio      p.value
#> 1       L    .    A - B 16.3333333 5.157299 48  3.1670322 0.0026768025
#> 2       M    .    A - B -4.7777778 5.157299 48 -0.9264108 0.3588672592
#> 3       H    .    A - B  5.7777778 5.157299 48  1.1203107 0.2681556374
#> 4       .    A    L - M 20.5555556 5.157299 48  3.9857208 0.0006572745
#> 5       .    A    L - H 20.0000000 5.157299 48  3.8779987 0.0009185485
#> 6       .    A    M - H -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 7       .    B    L - M -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 8       .    B    L - H  9.4444444 5.157299 48  1.8312771 0.1703517915
#> 9       .    B    M - H 10.0000000 5.157299 48  1.9389993 0.1388570254

^{由 reprex package (v1.0.0)}

于 2021-03-15 创建

combine = TRUE 的方法不适用于 adjust = "none" 以外的任何对象，因为家族规模是所有对比的总和。此外，Tukey 方法只能应用于单组成对比较。两组或多组成对比较的组合不构成成对比较组，因此无法使用 Tukey 方法进行调整。

如果目标是将结果展示给其他人，我仍然不建议这样做；因为查看这个数据框会让人非常不清楚 P 值调整是如何完成的，以及调整到哪些家庭。在这个例子中我们有六个比较系列； cons 的原始注释显示清楚地表明了这一点，而 df 的列表则没有。

为什么将 emmeans contrasts 转换为 data.frame 不报告正确的 p 值？

Why is converting emmeans contrasts to a data.frame not reporting correct p-values?

r

dataframe

emmeans

可重现的例子：