为什么将 emmeans contrasts 转换为 data.frame 不报告正确的 p 值?
Why is converting emmeans contrasts to a data.frame not reporting correct p-values?
我 运行 的对比的 p 值未正确转换为 data.frame。为什么会这样,我该如何解决?
emmeans 的控制台输出:
> pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
$`simple contrasts for Status`
Stim = 1, Treatment = None:
contrast estimate SE df t.ratio p.value
Control - Subclinical -0.24213 0.0571 57.5 -4.241 0.0002
Control - Clinical -0.16275 0.0571 57.5 -2.851 0.0164
Subclinical - Clinical 0.07938 0.0571 57.5 1.390 0.3526
emmeans data.frame 的控制台输出:
> mod.EMM <- pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
> as.data.frame(mod.EMM)
Stim Treatment Status contrast estimate SE df t.ratio p.value
1 1 None . Control - Subclinical -0.242125000 0.05709000 57.46544 -4.24111052 3.680551e-03
2 1 None . Control - Clinical -0.162750000 0.05709000 57.46544 -2.85076195 2.721389e-01
3 1 None . Subclinical - Clinical 0.079375000 0.05709000 57.46544 1.39034857 1.000000e+00
可重现的例子:
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
library(emmeans)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
#
# Treatment = chilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
#
#
# $`simple contrasts for Treatment`
# Type = Quebec:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 3.58 1.85 79 1.934 0.0566
#
# Type = Mississippi:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 10.14 1.85 79 5.477 <.0001
as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
# Treatment Type contrast estimate SE df t.ratio p.value
# 1 nonchilled . Quebec - Mississippi 9.380952 1.851185 79 5.067538 1.036140e-05
# 2 chilled . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3 . Quebec nonchilled - chilled 3.580952 1.851185 79 1.934410 2.265719e-01
# 4 . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
#
# Treatment = chilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
#
#
# $`simple contrasts for Treatment`
# Type = Quebec:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 3.58 1.85 79 1.934 0.0566
#
# Type = Mississippi:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 10.14 1.85 79 5.477 <.0001
as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
# Treatment Type contrast estimate SE df t.ratio p.value
# 1 nonchilled . Quebec - Mississippi 9.380952 1.851185 79 5.067538 1.036140e-05
# 2 chilled . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3 . Quebec nonchilled - chilled 3.580952 1.851185 79 1.934410 2.265719e-01
# 4 . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06
来自外部帮助的更新:
“似乎 pairs() 的结果本身并不是一个可以转换为数据框的 emmGrid 对象,而是一个包含两个 emmGrid 对象的列表。如果您从列表中按位置提取这些对象中的任何一个,使用 [[]],像这样,
pairs(emmeans(model1, ~ Type*Treatment), simple = "each")[[2]]
然后你可以 data.frame() 每个结果,它都是正确的。您最终会得到两个不同的数据帧来保存涉及两个不同变量的对比,但这些数据帧中的每一个都有正确的 p 值。"
我希望有人能更好地解决这个问题,这样我就可以将所有的对比组合成一个 data.frame。
您看到的不同 p 值反映了未调整的 p 值与针对多重比较调整的 p 值。
?emmeans::pairs
文档告诉我们:
Ordinarily, when simple is a list or "each", the return value is an
emm_list object with each entry in correspondence with the entries of
simple. However, with combine = TRUE, the elements are all combined
into one family of contrasts in a single emmGrid object using
rbind.emmGrid.. In that case, the adjust argument sets the adjustment
method for the combined set of contrasts.
因此,对于您的可重现示例,您可以将所有简单的主效应组合到一个数据框中,并将 combine
参数设置为 TRUE
。您可以通过设置 adjust
参数在未调整和调整后的 p 值之间进行选择。
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+ adjust = "none")
Treatment Type contrast estimate SE df t.ratio p.value
nonchilled . Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
chilled . Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
. Quebec nonchilled - chilled 3.58 1.85 79 1.934 0.0566
. Mississippi nonchilled - chilled 10.14 1.85 79 5.477 <.0001
这是一个经过 Bonferroni 调整的:
> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+ adjust = "bonferroni")
Treatment Type contrast estimate SE df t.ratio p.value
nonchilled . Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
chilled . Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
. Quebec nonchilled - chilled 3.58 1.85 79 1.934 0.2266
. Mississippi nonchilled - chilled 10.14 1.85 79 5.477 <.0001
P value adjustment: bonferroni method for 4 tests
这可以很容易地完成,但您要做的是获取基本输出,然后插入正确的 P 值。为了说明,我将展示一个不同的例子,其中一个因素有两个以上的水平。
require(emmeans)
#> Loading required package: emmeans
warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
(cons = pairs(emmeans(warp.lm, ~ wool * tension), simple = "each"))
#> $`simple contrasts for wool`
#> tension = L:
#> contrast estimate SE df t.ratio p.value
#> A - B 16.33 5.16 48 3.167 0.0027
#>
#> tension = M:
#> contrast estimate SE df t.ratio p.value
#> A - B -4.78 5.16 48 -0.926 0.3589
#>
#> tension = H:
#> contrast estimate SE df t.ratio p.value
#> A - B 5.78 5.16 48 1.120 0.2682
#>
#>
#> $`simple contrasts for tension`
#> wool = A:
#> contrast estimate SE df t.ratio p.value
#> L - M 20.556 5.16 48 3.986 0.0007
#> L - H 20.000 5.16 48 3.878 0.0009
#> M - H -0.556 5.16 48 -0.108 0.9936
#>
#> wool = B:
#> contrast estimate SE df t.ratio p.value
#> L - M -0.556 5.16 48 -0.108 0.9936
#> L - H 9.444 5.16 48 1.831 0.1704
#> M - H 10.000 5.16 48 1.939 0.1389
#>
#> P value adjustment: tukey method for comparing a family of 3 estimates
# get the estimates, etc. into a data frame:
df = as.data.frame(cons)
# get the Tukey-adjusted P values:
pv = unlist(lapply(unlist(cons), function(x) as.data.frame(x)$p.value))
# replace the p values and display
df$p.value = pv
df
#> tension wool contrast estimate SE df t.ratio p.value
#> 1 L . A - B 16.3333333 5.157299 48 3.1670322 0.0026768025
#> 2 M . A - B -4.7777778 5.157299 48 -0.9264108 0.3588672592
#> 3 H . A - B 5.7777778 5.157299 48 1.1203107 0.2681556374
#> 4 . A L - M 20.5555556 5.157299 48 3.9857208 0.0006572745
#> 5 . A L - H 20.0000000 5.157299 48 3.8779987 0.0009185485
#> 6 . A M - H -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 7 . B L - M -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 8 . B L - H 9.4444444 5.157299 48 1.8312771 0.1703517915
#> 9 . B M - H 10.0000000 5.157299 48 1.9389993 0.1388570254
由 reprex package (v1.0.0)
于 2021-03-15 创建
combine = TRUE
的方法不适用于 adjust = "none"
以外的任何对象,因为家族规模是所有对比的总和。此外,Tukey 方法只能应用于单组成对比较。两组或多组成对比较的组合不构成成对比较组,因此无法使用 Tukey 方法进行调整。
如果目标是将结果展示给其他人,我仍然不建议这样做;因为查看这个数据框会让人非常不清楚 P 值调整是如何完成的,以及调整到哪些家庭。在这个例子中我们有六个比较系列; cons
的原始注释显示清楚地表明了这一点,而 df
的列表则没有。
我 运行 的对比的 p 值未正确转换为 data.frame。为什么会这样,我该如何解决?
emmeans 的控制台输出:
> pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
$`simple contrasts for Status`
Stim = 1, Treatment = None:
contrast estimate SE df t.ratio p.value
Control - Subclinical -0.24213 0.0571 57.5 -4.241 0.0002
Control - Clinical -0.16275 0.0571 57.5 -2.851 0.0164
Subclinical - Clinical 0.07938 0.0571 57.5 1.390 0.3526
emmeans data.frame 的控制台输出:
> mod.EMM <- pairs(emmeans(lmer.mod, ~ Status*Stim*Treatment), simple = "each")
> as.data.frame(mod.EMM)
Stim Treatment Status contrast estimate SE df t.ratio p.value
1 1 None . Control - Subclinical -0.242125000 0.05709000 57.46544 -4.24111052 3.680551e-03
2 1 None . Control - Clinical -0.162750000 0.05709000 57.46544 -2.85076195 2.721389e-01
3 1 None . Subclinical - Clinical 0.079375000 0.05709000 57.46544 1.39034857 1.000000e+00
可重现的例子:
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
library(emmeans)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
#
# Treatment = chilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
#
#
# $`simple contrasts for Treatment`
# Type = Quebec:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 3.58 1.85 79 1.934 0.0566
#
# Type = Mississippi:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 10.14 1.85 79 5.477 <.0001
as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
# Treatment Type contrast estimate SE df t.ratio p.value
# 1 nonchilled . Quebec - Mississippi 9.380952 1.851185 79 5.067538 1.036140e-05
# 2 chilled . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3 . Quebec nonchilled - chilled 3.580952 1.851185 79 1.934410 2.265719e-01
# 4 . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
pairs(emmeans(model1, ~ Type*Treatment), simple="each")
# $`simple contrasts for Type`
# Treatment = nonchilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
#
# Treatment = chilled:
# contrast estimate SE df t.ratio p.value
# Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
#
#
# $`simple contrasts for Treatment`
# Type = Quebec:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 3.58 1.85 79 1.934 0.0566
#
# Type = Mississippi:
# contrast estimate SE df t.ratio p.value
# nonchilled - chilled 10.14 1.85 79 5.477 <.0001
as.data.frame(pairs(emmeans(model1, ~ Type*Treatment), simple="each"))
# Treatment Type contrast estimate SE df t.ratio p.value
# 1 nonchilled . Quebec - Mississippi 9.380952 1.851185 79 5.067538 1.036140e-05
# 2 chilled . Quebec - Mississippi 15.938095 1.851185 79 8.609670 2.252161e-12
# 3 . Quebec nonchilled - chilled 3.580952 1.851185 79 1.934410 2.265719e-01
# 4 . Mississippi nonchilled - chilled 10.138095 1.851185 79 5.476542 1.995066e-06
来自外部帮助的更新:
“似乎 pairs() 的结果本身并不是一个可以转换为数据框的 emmGrid 对象,而是一个包含两个 emmGrid 对象的列表。如果您从列表中按位置提取这些对象中的任何一个,使用 [[]],像这样,
pairs(emmeans(model1, ~ Type*Treatment), simple = "each")[[2]]
然后你可以 data.frame() 每个结果,它都是正确的。您最终会得到两个不同的数据帧来保存涉及两个不同变量的对比,但这些数据帧中的每一个都有正确的 p 值。"
我希望有人能更好地解决这个问题,这样我就可以将所有的对比组合成一个 data.frame。
您看到的不同 p 值反映了未调整的 p 值与针对多重比较调整的 p 值。
?emmeans::pairs
文档告诉我们:
Ordinarily, when simple is a list or "each", the return value is an emm_list object with each entry in correspondence with the entries of simple. However, with combine = TRUE, the elements are all combined into one family of contrasts in a single emmGrid object using rbind.emmGrid.. In that case, the adjust argument sets the adjustment method for the combined set of contrasts.
因此,对于您的可重现示例,您可以将所有简单的主效应组合到一个数据框中,并将 combine
参数设置为 TRUE
。您可以通过设置 adjust
参数在未调整和调整后的 p 值之间进行选择。
model1 <- lm(uptake ~ Type + Treatment + conc + Type*Treatment, data=CO2)
> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+ adjust = "none")
Treatment Type contrast estimate SE df t.ratio p.value
nonchilled . Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
chilled . Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
. Quebec nonchilled - chilled 3.58 1.85 79 1.934 0.0566
. Mississippi nonchilled - chilled 10.14 1.85 79 5.477 <.0001
这是一个经过 Bonferroni 调整的:
> pairs(emmeans(model1, ~ Type*Treatment), simple = "each", combine = TRUE,
+ adjust = "bonferroni")
Treatment Type contrast estimate SE df t.ratio p.value
nonchilled . Quebec - Mississippi 9.38 1.85 79 5.068 <.0001
chilled . Quebec - Mississippi 15.94 1.85 79 8.610 <.0001
. Quebec nonchilled - chilled 3.58 1.85 79 1.934 0.2266
. Mississippi nonchilled - chilled 10.14 1.85 79 5.477 <.0001
P value adjustment: bonferroni method for 4 tests
这可以很容易地完成,但您要做的是获取基本输出,然后插入正确的 P 值。为了说明,我将展示一个不同的例子,其中一个因素有两个以上的水平。
require(emmeans)
#> Loading required package: emmeans
warp.lm = lm(breaks ~ wool * tension, data = warpbreaks)
(cons = pairs(emmeans(warp.lm, ~ wool * tension), simple = "each"))
#> $`simple contrasts for wool`
#> tension = L:
#> contrast estimate SE df t.ratio p.value
#> A - B 16.33 5.16 48 3.167 0.0027
#>
#> tension = M:
#> contrast estimate SE df t.ratio p.value
#> A - B -4.78 5.16 48 -0.926 0.3589
#>
#> tension = H:
#> contrast estimate SE df t.ratio p.value
#> A - B 5.78 5.16 48 1.120 0.2682
#>
#>
#> $`simple contrasts for tension`
#> wool = A:
#> contrast estimate SE df t.ratio p.value
#> L - M 20.556 5.16 48 3.986 0.0007
#> L - H 20.000 5.16 48 3.878 0.0009
#> M - H -0.556 5.16 48 -0.108 0.9936
#>
#> wool = B:
#> contrast estimate SE df t.ratio p.value
#> L - M -0.556 5.16 48 -0.108 0.9936
#> L - H 9.444 5.16 48 1.831 0.1704
#> M - H 10.000 5.16 48 1.939 0.1389
#>
#> P value adjustment: tukey method for comparing a family of 3 estimates
# get the estimates, etc. into a data frame:
df = as.data.frame(cons)
# get the Tukey-adjusted P values:
pv = unlist(lapply(unlist(cons), function(x) as.data.frame(x)$p.value))
# replace the p values and display
df$p.value = pv
df
#> tension wool contrast estimate SE df t.ratio p.value
#> 1 L . A - B 16.3333333 5.157299 48 3.1670322 0.0026768025
#> 2 M . A - B -4.7777778 5.157299 48 -0.9264108 0.3588672592
#> 3 H . A - B 5.7777778 5.157299 48 1.1203107 0.2681556374
#> 4 . A L - M 20.5555556 5.157299 48 3.9857208 0.0006572745
#> 5 . A L - H 20.0000000 5.157299 48 3.8779987 0.0009185485
#> 6 . A M - H -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 7 . B L - M -0.5555556 5.157299 48 -0.1077222 0.9936237722
#> 8 . B L - H 9.4444444 5.157299 48 1.8312771 0.1703517915
#> 9 . B M - H 10.0000000 5.157299 48 1.9389993 0.1388570254
由 reprex package (v1.0.0)
于 2021-03-15 创建combine = TRUE
的方法不适用于 adjust = "none"
以外的任何对象,因为家族规模是所有对比的总和。此外,Tukey 方法只能应用于单组成对比较。两组或多组成对比较的组合不构成成对比较组,因此无法使用 Tukey 方法进行调整。
如果目标是将结果展示给其他人,我仍然不建议这样做;因为查看这个数据框会让人非常不清楚 P 值调整是如何完成的,以及调整到哪些家庭。在这个例子中我们有六个比较系列; cons
的原始注释显示清楚地表明了这一点,而 df
的列表则没有。