'Metafor' 的对数响应比为零
Log response ratios with 'Metafor' with zeros
我正在使用 R 中的 'metafor' 包来执行对数响应比率。我的一些平均值为零,这似乎是我的 escalc
命令后出现警告的原因(因为 log(0)
是 -inf)。 metafor 包提供了一种将小值加到零的方法来避免这种情况。文档指出:
"Cell entries with a zero can be problematic especially for the relative risk and the odds ratio. Adding a small constant to the cells of the 2 × 2 tables is a common solution to this problem [...] When to = "only0", add的值只加到2×2table的每个单元格上,只有那些tables中至少有一个单元格等于0 ."
出于某种原因,这并没有解决我的错误,可能是因为我的数据不是 2x2 table? (它是 summarize with ddply 的 ply 包的输出,类似于 this example 中的格式)。我必须手动用一个小数字替换零值还是有更优雅的方法? (请注意,在此示例中,零行的样本大小也为 1,因此没有方差,无论如何都会从分析中删除。我只想知道这对未来是如何工作的)。
可重现的例子:
dat<-dput(Bin_Y_count_summary_wide)
structure(list(Species.ID = c("CAFERANA", "TR11", "TR118", "TR500",
"TR504", "TR9", "TR9_US1"), Y_num_mean.early = c(2, 147.375,
4.5, 0.5, 12.5, 93.4523809523809, 5), N.early = c(1L, 4L, 2L,
4L, 4L, 7L, 2L), sd.early = c(NA, 174.699444284558, 6.36396103067893,
1, 22.4127939653523, 137.506118190001, 7.07106781186548), se.early = c(NA,
87.3497221422789, 4.5, 0.5, 11.2063969826762, 51.9724274972283,
5), Y_num_mean.late = c(0, 3.625, 2.98482142857143, 0.8, 3, 47.2,
0), N.late = c(1L, 4L, 7L, 10L, 10L, 8L, 1L), sd.late = c(NA,
7.25, 5.10407804830748, 1.75119007154183, 8.03118920210451, 40.7351024477486,
NA), se.late = c(NA, 3.625, 1.9291601697265, 0.553774924194538,
2.53968501984006, 14.4020335865659, NA), Y_num_mean.wet = c(NA,
71.5, 0, 12, 27, 0, NA), N.wet = c(NA, 2L, 1L, 2L, 2L, 2L, NA
), sd.wet = c(NA, 17.6776695296637, NA, 9.89949493661167, 38.1837661840736,
0, NA), se.wet = c(NA, 12.5, NA, 7, 27, 0, NA)), row.names = c(NA,
7L), .Names = c("Species.ID", "Y_num_mean.early", "N.early",
"sd.early", "se.early", "Y_num_mean.late", "N.late", "sd.late",
"se.late", "Y_num_mean.wet", "N.wet", "sd.wet", "se.wet"), class = "data.frame", reshapeWide = structure(list(
v.names = c("Y_num_mean", "N", "sd", "se"), timevar = "early_or_late",
idvar = "Species.ID", times = c("early", "late", "wet"),
varying = structure(c("Y_num_mean.early", "N.early", "sd.early",
"se.early", "Y_num_mean.late", "N.late", "sd.late", "se.late",
"Y_num_mean.wet", "N.wet", "sd.wet", "se.wet"), .Dim = c(4L,
3L))), .Names = c("v.names", "timevar", "idvar", "times",
"varying")))
# Warning produced from this command
test <- escalc(measure="ROM", m1i=Y_num_mean.early, sd1i=sd.early, n1i=N.early, m2i=Y_num_mean.late, sd2i=sd.late, n2i=N.late, data=dat, add=1/2, to="only0")
您引用的段落适用于可以根据 2x2 表格计算的度量(即 RR
、OR
、RD
、AS
和 PETO
). add
和 to
参数对 SMD
和 ROM
.
等度量没有任何影响
如果每个值都等于 0,则比例刻度变量的平均值为 0 的唯一方法(这是响应比率的假设)。因此,根据定义,方差也必须为 0 . 这适用于样本量是否为 1(在这种情况下方差当然也为 0)或者你是否有更大的样本量。
一般来说,只要两个均值至少有一个为0,就无法计算对数响应比。当然,人们可以开始手动向均值添加某种常量(对于 SD 也是如此),但这似乎相当随意。我们可以对 2x2 表中的计数进行调整是基于统计理论的动机(这些调整实际上是减少偏差,这也恰好使计数为 0 时某些度量的计算成为可能)。
我正在使用 R 中的 'metafor' 包来执行对数响应比率。我的一些平均值为零,这似乎是我的 escalc
命令后出现警告的原因(因为 log(0)
是 -inf)。 metafor 包提供了一种将小值加到零的方法来避免这种情况。文档指出:
"Cell entries with a zero can be problematic especially for the relative risk and the odds ratio. Adding a small constant to the cells of the 2 × 2 tables is a common solution to this problem [...] When to = "only0", add的值只加到2×2table的每个单元格上,只有那些tables中至少有一个单元格等于0 ."
出于某种原因,这并没有解决我的错误,可能是因为我的数据不是 2x2 table? (它是 summarize with ddply 的 ply 包的输出,类似于 this example 中的格式)。我必须手动用一个小数字替换零值还是有更优雅的方法? (请注意,在此示例中,零行的样本大小也为 1,因此没有方差,无论如何都会从分析中删除。我只想知道这对未来是如何工作的)。
可重现的例子:
dat<-dput(Bin_Y_count_summary_wide)
structure(list(Species.ID = c("CAFERANA", "TR11", "TR118", "TR500",
"TR504", "TR9", "TR9_US1"), Y_num_mean.early = c(2, 147.375,
4.5, 0.5, 12.5, 93.4523809523809, 5), N.early = c(1L, 4L, 2L,
4L, 4L, 7L, 2L), sd.early = c(NA, 174.699444284558, 6.36396103067893,
1, 22.4127939653523, 137.506118190001, 7.07106781186548), se.early = c(NA,
87.3497221422789, 4.5, 0.5, 11.2063969826762, 51.9724274972283,
5), Y_num_mean.late = c(0, 3.625, 2.98482142857143, 0.8, 3, 47.2,
0), N.late = c(1L, 4L, 7L, 10L, 10L, 8L, 1L), sd.late = c(NA,
7.25, 5.10407804830748, 1.75119007154183, 8.03118920210451, 40.7351024477486,
NA), se.late = c(NA, 3.625, 1.9291601697265, 0.553774924194538,
2.53968501984006, 14.4020335865659, NA), Y_num_mean.wet = c(NA,
71.5, 0, 12, 27, 0, NA), N.wet = c(NA, 2L, 1L, 2L, 2L, 2L, NA
), sd.wet = c(NA, 17.6776695296637, NA, 9.89949493661167, 38.1837661840736,
0, NA), se.wet = c(NA, 12.5, NA, 7, 27, 0, NA)), row.names = c(NA,
7L), .Names = c("Species.ID", "Y_num_mean.early", "N.early",
"sd.early", "se.early", "Y_num_mean.late", "N.late", "sd.late",
"se.late", "Y_num_mean.wet", "N.wet", "sd.wet", "se.wet"), class = "data.frame", reshapeWide = structure(list(
v.names = c("Y_num_mean", "N", "sd", "se"), timevar = "early_or_late",
idvar = "Species.ID", times = c("early", "late", "wet"),
varying = structure(c("Y_num_mean.early", "N.early", "sd.early",
"se.early", "Y_num_mean.late", "N.late", "sd.late", "se.late",
"Y_num_mean.wet", "N.wet", "sd.wet", "se.wet"), .Dim = c(4L,
3L))), .Names = c("v.names", "timevar", "idvar", "times",
"varying")))
# Warning produced from this command
test <- escalc(measure="ROM", m1i=Y_num_mean.early, sd1i=sd.early, n1i=N.early, m2i=Y_num_mean.late, sd2i=sd.late, n2i=N.late, data=dat, add=1/2, to="only0")
您引用的段落适用于可以根据 2x2 表格计算的度量(即 RR
、OR
、RD
、AS
和 PETO
). add
和 to
参数对 SMD
和 ROM
.
如果每个值都等于 0,则比例刻度变量的平均值为 0 的唯一方法(这是响应比率的假设)。因此,根据定义,方差也必须为 0 . 这适用于样本量是否为 1(在这种情况下方差当然也为 0)或者你是否有更大的样本量。
一般来说,只要两个均值至少有一个为0,就无法计算对数响应比。当然,人们可以开始手动向均值添加某种常量(对于 SD 也是如此),但这似乎相当随意。我们可以对 2x2 表中的计数进行调整是基于统计理论的动机(这些调整实际上是减少偏差,这也恰好使计数为 0 时某些度量的计算成为可能)。