在 data.tables 中创建新列时如何使用向量化的 get() 函数？

Question

我正在尝试从 data.table 中为我的数据拟合分布，并创建一个新的 data.table，其中包含日期、估计参数和隐含的第 75 个百分位数。但是，当我尝试计算第 75 个百分位数时，我注意到我的代码没有使用 sd 列。请考虑以下示例代码：

library(fitdistrplus)

distribution <- 'norm'
dt <- data.table(Date = c('2012', '2013', '2014'),
                 mean = 1:3,
                 sd = c(0.1, 0.2, 0.3))

x <- rnorm(100, 1, 0.2)
# I'm trying to write code which not only allows distributions with mean and sd parameters.
paramNames <- names(fitdist(x, distr = distribution)$estimate)
qFunctionName <- eval(get(paste0('q',distribution)))
qName <- paste0('percentile', '75')

print(dt[, eval(qName) := qFunctionName(p = 0.75, get(paramNames))])

#    Date mean  sd percentile75
# 1: 2012    1 0.1      1.67449
# 2: 2013    2 0.2      2.67449
# 3: 2014    3 0.3      3.67449

dt[1, percentile75] == qnorm(0.75, mean = 1, sd = 0.1)
# > FALSE
dt[1, percentile75] == qnorm(0.75, mean = 1, sd = 1)
# > TRUE

显然，get() 无法同时考虑 sd 列。我怎样才能改变代码，以便它可以获取所有具有保存在 paramNames 对象中的列名的列？

Answer 1

library(fitdistrplus)

distribution <- 'norm'
dt <- data.table(Date = c('2012', '2013', '2014'),
    mean = 1:3,
    sd = c(0.1, 0.2, 0.3))

set.seed(0L)
x <- rnorm(100, 1, 0.2)
paramNames <- names(fitdist(x, distr = distribution)$estimate)
qFunctionName <- match.fun(paste0('q',distribution))
qName <- paste0('percentile', '75')

dt[, (qName) := do.call(qFunctionName, c(list(p=0.75), mget(paramNames)))][]
all.equal(dt[1, percentile75], qnorm(0.75, mean = 1, sd = 0.1))

简而言之，get 只有 returns 第一个向量传递到 x 时，因此你需要 mget（尝试 get(c("x", "y")) 其中y 未定义）。

并且您还需要 do.call 来构造和执行函数调用。

并且由于数值稳定性问题，也不要使用 == 来测试 double。关于这个有很多 R 问题。

在 data.tables 中创建新列时如何使用向量化的 get() 函数？

How to use get() function vectorized when creating new columns in data.tables?

r

vectorization

data.table