如何组合泊松回归中的组来估计对比度?
How to combine groups in Poisson regression to estimate contrast?
我不确定这是一个编程问题还是统计问题(即我缺乏理解)。
我有一个泊松混合模型,我想用它来比较不同时间段内各组的平均计数。
mod <- glmer(Y ~ TX_GROUP * time + (1|ID), data = dat, family = poisson)
mod_em <- emmeans(mod, c("TX_GROUP","time"), type = "response")
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
现在,我想估计每个组的组合时间段 (2 + 3) 的边际计数。对以下记录的计数总和求幂不是一个简单的例子:
contrast(mod_em, list(`2 + 3` = c(0, 0, 1, 0, 1, 0)))
contrast(mod_em, list(`2 + 3` = c(0, 0, 0, 1, 0, 1)))
如果我尝试该值不接近匹配组合组的简单平均值。
使用包中的示例数据似乎没问题,但我会改用公式中的分组。
> warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, c("tension", "wool"))
> warp.emm
tension wool emmean SE df lower.CL upper.CL
L A 44.55556 3.646761 48 37.22325 51.88786
M A 24.00000 3.646761 48 16.66769 31.33231
H A 24.55556 3.646761 48 17.22325 31.88786
L B 28.22222 3.646761 48 20.88992 35.55453
M B 28.77778 3.646761 48 21.44547 36.11008
H B 18.77778 3.646761 48 11.44547 26.11008
Confidence level used: 0.95
L和M之和A应该是44+24~68,B应该是28+28~56
> contrast(warp.emm, list(A.LM = c(1, 1, 0, 0, 0, 0),
+ B.LM = c(0, 0, 0, 1, 1, 0)))
contrast estimate SE df t.ratio p.value
A.LM 68.55556 5.157299 48 13.293 <.0001
B.LM 57.00000 5.157299 48 11.052 <.0001
尽管我会在公式中使用分组。
> warp.em2 <- emmeans(warp.lm, ~tension|wool)
> contrast(warp.em2, list(LM = c(1, 1, 0)))
wool = A:
contrast estimate SE df t.ratio p.value
LM 68.55556 5.157299 48 13.293 <.0001
wool = B:
contrast estimate SE df t.ratio p.value
LM 57.00000 5.157299 48 11.052 <.0001
首先,我建议你把你的两个对比放在一个列表中,例如,
contr = list(`2+2|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
您必须决定何时要反向转换。请参阅 vignette on transformations 并注意 "timing is everything" 上的讨论。两个基本选项是:
一种选择:获取对数计数的边际均值,然后反变换:
mod_con = update(contrast(mod_emm, contr), tran = "log")
summary(mod_con, type = "response")
[update
调用是必需的,因为 contrast
剥离了转换,除非在特殊情况下,因为它并不总是知道分配给任意线性函数的比例。例如,两个平方根之差不在平方根尺度上。]
第二个选项:反向转换预测,然后对它们求和:
mod_emmr = regrid(mod_emm)
contrast(mod_emmr, contr)
这些结果之间的区别与几何平均值(选项 1)和算术平均值(选项 2)之间的区别相同。我怀疑它们中的任何一个都会产生与原始边际均值计数相同的结果,因为它们是基于您模型的预测。我个人认为第一个选项是更好的选择,因为求和是线性运算,模型在对数尺度上是线性的。
附录
实际上还有第三种选择,即创建分组变量。我将用 pigs
数据集进行说明。
> pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
以下是 percent
的 EMM:
> emmeans(pigs.lm, "percent")
percent emmean SE df lower.CL upper.CL
9 3.445307 0.04088810 23 3.360723 3.529890
12 3.624861 0.03837600 23 3.545475 3.704248
15 3.662706 0.04372996 23 3.572244 3.753168
18 3.745156 0.05296030 23 3.635599 3.854713
Results are averaged over the levels of: source
Results are given on the log (not the response) scale.
Confidence level used: 0.95
现在让我们创建一个分组因子group
:
> pigs.emm = add_grouping(ref_grid(pigs.lm), "group", "percent", c("1&2","1&2","3&4","3&4"))
> str(pigs.emm)
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
group = 1&2, 3&4
Nesting structure: percent %in% group
Transformation: “log”
现在获取 group
的 EMM,注意它们只是各个级别的平均值:
> emmeans(pigs.emm, "group")
group emmean SE df lower.CL upper.CL
1&2 3.535084 0.02803816 23 3.477083 3.593085
3&4 3.703931 0.03414907 23 3.633288 3.774574
Results are averaged over the levels of: source, percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
这里是响应量表的总结:
> summary(.Last.value, type = "response")
group response SE df lower.CL upper.CL
1&2 34.29790 0.961650 23 32.36517 36.34605
3&4 40.60662 1.386678 23 37.83703 43.57893
Results are averaged over the levels of: source, percent
Confidence level used: 0.95
Intervals are back-transformed from the log scale
这些是平均值而不是总和,但除此之外它是有效的,并且转换不会像在 contrast()
中那样被压缩
谢谢。第二种方法对我有用,但不是第一种(这似乎更直观)——它似乎没有给我反向转换的值:
(mod_em_inj <- emmeans(mod_inj, c("TX_GROUP","time"), type = "response"))
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
# Marginal means for combined period (7 - 24 months) - Method 1
(mod_em_inj2 <- emmeans(mod_inj, c("TX_GROUP","time")))
TX_GROUP time emmean SE df asymp.LCL asymp.UCL
0 1 1.7480092 0.07951497 Inf 1.59216273 1.9038557
1 1 1.7100619 0.08391274 Inf 1.54559591 1.8745278
0 2 0.8938574 0.12194916 Inf 0.65484147 1.1328734
1 2 0.3833880 0.15724024 Inf 0.07520279 0.6915732
0 3 1.5195610 0.09018011 Inf 1.34281119 1.6963107
1 3 0.5169035 0.14747615 Inf 0.22785558 0.8059515
contr = list(`2+3|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
summary(contrast(mod_em_inj2, contr), type = "response")
contrast estimate SE df z.ratio p.value
2+3|0 2.4134184 0.1541715 Inf 15.654 <.0001
2+3|1 0.9002915 0.2198023 Inf 4.096 <.0001
# Marginal means for combined period (7 - 24 months) - Method 2
mod_emmr = regrid(mod_em_inj)
contrast(mod_emmr, contr)
contrast estimate SE df z.ratio p.value
2+3|0 7.014759 0.5169870 Inf 13.569 <.0001
2+3|1 3.144075 0.3448274 Inf 9.118 <.0001
7.01和3.14的数值是我应该得到的。如果我在您的回复中遗漏了一些明显的内容,我们深表歉意。
我不确定这是一个编程问题还是统计问题(即我缺乏理解)。
我有一个泊松混合模型,我想用它来比较不同时间段内各组的平均计数。
mod <- glmer(Y ~ TX_GROUP * time + (1|ID), data = dat, family = poisson)
mod_em <- emmeans(mod, c("TX_GROUP","time"), type = "response")
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
现在,我想估计每个组的组合时间段 (2 + 3) 的边际计数。对以下记录的计数总和求幂不是一个简单的例子:
contrast(mod_em, list(`2 + 3` = c(0, 0, 1, 0, 1, 0)))
contrast(mod_em, list(`2 + 3` = c(0, 0, 0, 1, 0, 1)))
如果我尝试该值不接近匹配组合组的简单平均值。
使用包中的示例数据似乎没问题,但我会改用公式中的分组。
> warp.lm <- lm(breaks ~ wool*tension, data = warpbreaks)
> warp.emm <- emmeans(warp.lm, c("tension", "wool"))
> warp.emm
tension wool emmean SE df lower.CL upper.CL
L A 44.55556 3.646761 48 37.22325 51.88786
M A 24.00000 3.646761 48 16.66769 31.33231
H A 24.55556 3.646761 48 17.22325 31.88786
L B 28.22222 3.646761 48 20.88992 35.55453
M B 28.77778 3.646761 48 21.44547 36.11008
H B 18.77778 3.646761 48 11.44547 26.11008
Confidence level used: 0.95
L和M之和A应该是44+24~68,B应该是28+28~56
> contrast(warp.emm, list(A.LM = c(1, 1, 0, 0, 0, 0),
+ B.LM = c(0, 0, 0, 1, 1, 0)))
contrast estimate SE df t.ratio p.value
A.LM 68.55556 5.157299 48 13.293 <.0001
B.LM 57.00000 5.157299 48 11.052 <.0001
尽管我会在公式中使用分组。
> warp.em2 <- emmeans(warp.lm, ~tension|wool)
> contrast(warp.em2, list(LM = c(1, 1, 0)))
wool = A:
contrast estimate SE df t.ratio p.value
LM 68.55556 5.157299 48 13.293 <.0001
wool = B:
contrast estimate SE df t.ratio p.value
LM 57.00000 5.157299 48 11.052 <.0001
首先,我建议你把你的两个对比放在一个列表中,例如,
contr = list(`2+2|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
您必须决定何时要反向转换。请参阅 vignette on transformations 并注意 "timing is everything" 上的讨论。两个基本选项是:
一种选择:获取对数计数的边际均值,然后反变换:
mod_con = update(contrast(mod_emm, contr), tran = "log")
summary(mod_con, type = "response")
[update
调用是必需的,因为 contrast
剥离了转换,除非在特殊情况下,因为它并不总是知道分配给任意线性函数的比例。例如,两个平方根之差不在平方根尺度上。]
第二个选项:反向转换预测,然后对它们求和:
mod_emmr = regrid(mod_emm)
contrast(mod_emmr, contr)
这些结果之间的区别与几何平均值(选项 1)和算术平均值(选项 2)之间的区别相同。我怀疑它们中的任何一个都会产生与原始边际均值计数相同的结果,因为它们是基于您模型的预测。我个人认为第一个选项是更好的选择,因为求和是线性运算,模型在对数尺度上是线性的。
附录
实际上还有第三种选择,即创建分组变量。我将用 pigs
数据集进行说明。
> pigs.lm <- lm(log(conc) ~ source + factor(percent), data = pigs)
以下是 percent
的 EMM:
> emmeans(pigs.lm, "percent")
percent emmean SE df lower.CL upper.CL
9 3.445307 0.04088810 23 3.360723 3.529890
12 3.624861 0.03837600 23 3.545475 3.704248
15 3.662706 0.04372996 23 3.572244 3.753168
18 3.745156 0.05296030 23 3.635599 3.854713
Results are averaged over the levels of: source
Results are given on the log (not the response) scale.
Confidence level used: 0.95
现在让我们创建一个分组因子group
:
> pigs.emm = add_grouping(ref_grid(pigs.lm), "group", "percent", c("1&2","1&2","3&4","3&4"))
> str(pigs.emm)
'emmGrid' object with variables:
source = fish, soy, skim
percent = 9, 12, 15, 18
group = 1&2, 3&4
Nesting structure: percent %in% group
Transformation: “log”
现在获取 group
的 EMM,注意它们只是各个级别的平均值:
> emmeans(pigs.emm, "group")
group emmean SE df lower.CL upper.CL
1&2 3.535084 0.02803816 23 3.477083 3.593085
3&4 3.703931 0.03414907 23 3.633288 3.774574
Results are averaged over the levels of: source, percent
Results are given on the log (not the response) scale.
Confidence level used: 0.95
这里是响应量表的总结:
> summary(.Last.value, type = "response")
group response SE df lower.CL upper.CL
1&2 34.29790 0.961650 23 32.36517 36.34605
3&4 40.60662 1.386678 23 37.83703 43.57893
Results are averaged over the levels of: source, percent
Confidence level used: 0.95
Intervals are back-transformed from the log scale
这些是平均值而不是总和,但除此之外它是有效的,并且转换不会像在 contrast()
谢谢。第二种方法对我有用,但不是第一种(这似乎更直观)——它似乎没有给我反向转换的值:
(mod_em_inj <- emmeans(mod_inj, c("TX_GROUP","time"), type = "response"))
TX_GROUP time rate SE df asymp.LCL asymp.UCL
0 1 5.743158 0.4566671 Inf 4.914366 6.711723
1 1 5.529303 0.4639790 Inf 4.690766 6.517741
0 2 2.444541 0.2981097 Inf 1.924837 3.104564
1 2 1.467247 0.2307103 Inf 1.078103 1.996855
0 3 4.570218 0.4121428 Inf 3.829795 5.453790
1 3 1.676827 0.2472920 Inf 1.255904 2.238826
# Marginal means for combined period (7 - 24 months) - Method 1
(mod_em_inj2 <- emmeans(mod_inj, c("TX_GROUP","time")))
TX_GROUP time emmean SE df asymp.LCL asymp.UCL
0 1 1.7480092 0.07951497 Inf 1.59216273 1.9038557
1 1 1.7100619 0.08391274 Inf 1.54559591 1.8745278
0 2 0.8938574 0.12194916 Inf 0.65484147 1.1328734
1 2 0.3833880 0.15724024 Inf 0.07520279 0.6915732
0 3 1.5195610 0.09018011 Inf 1.34281119 1.6963107
1 3 0.5169035 0.14747615 Inf 0.22785558 0.8059515
contr = list(`2+3|0` = c(0, 0, 1, 0, 1, 0),
`2+3|1` = c(0, 0, 0, 1, 0, 1))
summary(contrast(mod_em_inj2, contr), type = "response")
contrast estimate SE df z.ratio p.value
2+3|0 2.4134184 0.1541715 Inf 15.654 <.0001
2+3|1 0.9002915 0.2198023 Inf 4.096 <.0001
# Marginal means for combined period (7 - 24 months) - Method 2
mod_emmr = regrid(mod_em_inj)
contrast(mod_emmr, contr)
contrast estimate SE df z.ratio p.value
2+3|0 7.014759 0.5169870 Inf 13.569 <.0001
2+3|1 3.144075 0.3448274 Inf 9.118 <.0001
7.01和3.14的数值是我应该得到的。如果我在您的回复中遗漏了一些明显的内容,我们深表歉意。