如何迭代多个列表和矩阵以在 R 中使用并行处理调用函数?
How do I iterate over several lists and matrices to call a function using parallel processing in R?
我一直在尝试使用名为 ipsi
的 R 函数,它接受参数 (a, y, id, time, x.trt, x.out, delta.seq, nsplits)
最初,参数的组成部分在一个数据帧中(除了 delta.seq
和 nsplits
稍后编码),但我的理解是我需要将它们放在单独的列表中,并且在 x.trt
和 x.out
的情况下,矩阵。这个函数很容易对每个参数之一进行 运行,但是由于我在将数据帧拆分为不同的元素以作为 ipsi
参数之前将数据帧乘以 30 次,所以我现在想迭代元素集 30 次,就好像有 30 个数据帧一样。另外,我想并行化以优化我的计算能力。
我刚刚扩展了 npcausal 示例:
n <- 500
T <- 4
time <- rep(1:T, n)
time <- list(time,time,time,time,time,time,time,time,time,time,time,time,time,time,time,
time,time,time,time,time,time,time,time,time,time,time,time,time,time,time)
id <- rep(1:n, rep(T, n))
id <- list(id,id,id,id,id,id,id,id,id,id,id,id,id,id,id,
id,id,id,id,id,id,id,id,id,id,id,id,id,id,id)
x.trt <- matrix(rnorm(n * T * 5), nrow = n * T)
x.trt <- list(x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,
x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt)
x.out <- matrix(rnorm(n * T * 5), nrow = n * T)
x.out <- list(x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,
x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out)
a <- rbinom(n * T, 1, .5)
a <- list(a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a)
y <- rnorm(mean=1,n)
y <- list(y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,
y,y,y,y,y,y,y,y,y,y,y,y,y,y,y)
d.seq <- seq(0.1, 5, length.out = 10)
d.seq <- list(d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,
d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq)
set.seed(500, kind = "L'Ecuyer-CMRG")
numcores <- future::availableCores()
cl <- parallel::makeCluster(numcores)
parallel::clusterEvalQ(cl, library(dplyr))
parallel::clusterEvalQ(cl, library(npcausal))
parallel::clusterExport(cl, "d.seq", envir = environment())
parallel::clusterEvalQ(cl, d.seq <- d.seq)
new_element <- parallel::parLapply(cl = cl, for(i in 1:30){
npcausal::ipsi(a = a[[i]],
y = y[[i]],
id = id[[i]],
time = time[[i]],
x.out = x.out[[i]],
x.trt = x.trt[[i]],
delta.seq = d.seq[[i]],
nsplits = 10)
})
这实际上是 运行s,但在过程结束时它给我一个错误,说 FUN
丢失了。我已经知道了,但是除了 ipsi
之外我没有 FUN
可以打电话。感谢您提供的任何帮助。
我的建议是首先弄清楚如何使用常规的 base-R *apply 函数而不 担心并行化。我怀疑你可以为此使用 mapply()
,所以类似于(未确认):
res <- mapply(
a, y, id, time, xout, x,out, x.trt, d.seq,
FUN = function(a_i, y_i, id_i, time_i, xout_i, x,out_i, x.trt_i, d.seq_i) {
npcausal::ipsi(a = a_i, y = y_i, id = id_i, time = time_i,
x.out = x.out_i, x.trt = x.trt_i, delta.seq = d.seq_i,
nsplits = 10)
}
)
当你想通了那部分后,你就可以开始考虑并行化了。
(免责声明:我是作者)如果你得到了一个 mapply()
解决方案,那么最简单的方法就是用 future_mapply()
of the future.apply 包替换它。如果您设置 plan(multisession)
.
,这将在您的本地计算机上并行化
我一直在尝试使用名为 ipsi
的 R 函数,它接受参数 (a, y, id, time, x.trt, x.out, delta.seq, nsplits)
最初,参数的组成部分在一个数据帧中(除了 delta.seq
和 nsplits
稍后编码),但我的理解是我需要将它们放在单独的列表中,并且在 x.trt
和 x.out
的情况下,矩阵。这个函数很容易对每个参数之一进行 运行,但是由于我在将数据帧拆分为不同的元素以作为 ipsi
参数之前将数据帧乘以 30 次,所以我现在想迭代元素集 30 次,就好像有 30 个数据帧一样。另外,我想并行化以优化我的计算能力。
我刚刚扩展了 npcausal 示例:
n <- 500
T <- 4
time <- rep(1:T, n)
time <- list(time,time,time,time,time,time,time,time,time,time,time,time,time,time,time,
time,time,time,time,time,time,time,time,time,time,time,time,time,time,time)
id <- rep(1:n, rep(T, n))
id <- list(id,id,id,id,id,id,id,id,id,id,id,id,id,id,id,
id,id,id,id,id,id,id,id,id,id,id,id,id,id,id)
x.trt <- matrix(rnorm(n * T * 5), nrow = n * T)
x.trt <- list(x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,
x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt,x.trt)
x.out <- matrix(rnorm(n * T * 5), nrow = n * T)
x.out <- list(x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,
x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out,x.out)
a <- rbinom(n * T, 1, .5)
a <- list(a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a)
y <- rnorm(mean=1,n)
y <- list(y,y,y,y,y,y,y,y,y,y,y,y,y,y,y,
y,y,y,y,y,y,y,y,y,y,y,y,y,y,y)
d.seq <- seq(0.1, 5, length.out = 10)
d.seq <- list(d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,
d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq,d.seq)
set.seed(500, kind = "L'Ecuyer-CMRG")
numcores <- future::availableCores()
cl <- parallel::makeCluster(numcores)
parallel::clusterEvalQ(cl, library(dplyr))
parallel::clusterEvalQ(cl, library(npcausal))
parallel::clusterExport(cl, "d.seq", envir = environment())
parallel::clusterEvalQ(cl, d.seq <- d.seq)
new_element <- parallel::parLapply(cl = cl, for(i in 1:30){
npcausal::ipsi(a = a[[i]],
y = y[[i]],
id = id[[i]],
time = time[[i]],
x.out = x.out[[i]],
x.trt = x.trt[[i]],
delta.seq = d.seq[[i]],
nsplits = 10)
})
这实际上是 运行s,但在过程结束时它给我一个错误,说 FUN
丢失了。我已经知道了,但是除了 ipsi
之外我没有 FUN
可以打电话。感谢您提供的任何帮助。
我的建议是首先弄清楚如何使用常规的 base-R *apply 函数而不 担心并行化。我怀疑你可以为此使用 mapply()
,所以类似于(未确认):
res <- mapply(
a, y, id, time, xout, x,out, x.trt, d.seq,
FUN = function(a_i, y_i, id_i, time_i, xout_i, x,out_i, x.trt_i, d.seq_i) {
npcausal::ipsi(a = a_i, y = y_i, id = id_i, time = time_i,
x.out = x.out_i, x.trt = x.trt_i, delta.seq = d.seq_i,
nsplits = 10)
}
)
当你想通了那部分后,你就可以开始考虑并行化了。
(免责声明:我是作者)如果你得到了一个 mapply()
解决方案,那么最简单的方法就是用 future_mapply()
of the future.apply 包替换它。如果您设置 plan(multisession)
.