使用 parSapply 生成随机数
Using parSapply to generate random numbers
我正在尝试 运行 一个函数,该函数中有一个随机数生成器。结果与我预期的不一样,所以我做了以下测试:
# Case 1
set.seed(100)
A1 = matrix(NA,20,10)
for (i in 1:10) {
A1[,i] = sample(1:100,20)
}
# Case 2
set.seed(100)
A2 = sapply(seq_len(10),function(x) sample(1:100,20))
# Case 3
require(parallel)
set.seed(100)
cl <- makeCluster(detectCores() - 1)
A3 = parSapply(cl,seq_len(10), function(x) sample(1:100,20))
stopCluster(cl)
# Check: Case 1 result equals Case 2 result
identical(A1,A2)
# [1] TRUE
# Check: Case 1 result does NOT equal to Case 3 result
identical(A1,A3)
# [1] FALSE
# Check2: Would like to check if it's a matter of ordering
range(rowSums(A1))
# [1] 319 704
range(rowSums(A3))
# [1] 288 612
在上面的代码中,parSapply 生成了一组与 A1 和 A2 不同的随机数。我拥有 Check2 的目的是,我怀疑 parSapply 可能会改变顺序,但似乎并非如此,因为这些随机数的最大和最小总和不同。
感谢是否有人可以说明为什么 parSapply 会给出与 sapply 不同的结果。我在这里错过了什么?
提前致谢!
看看 ?vignette(parallel)
,尤其是 "Section 6 Random-number generation"。除其他事项外,它还规定了以下内容
Some care is needed with parallel computation using (pseudo-)random numbers: the processes/threads which run separate parts of the computation need to run independent (and preferably reproducible) random-number streams.
When an R process is started up it takes the random-number seed from the object .Random.seed in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help on RNG). Thus worker processes might get the same seed
because a workspace containing .Random.seed was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).
您还应该看看 ?clusterSetRNGStream
。
我正在尝试 运行 一个函数,该函数中有一个随机数生成器。结果与我预期的不一样,所以我做了以下测试:
# Case 1
set.seed(100)
A1 = matrix(NA,20,10)
for (i in 1:10) {
A1[,i] = sample(1:100,20)
}
# Case 2
set.seed(100)
A2 = sapply(seq_len(10),function(x) sample(1:100,20))
# Case 3
require(parallel)
set.seed(100)
cl <- makeCluster(detectCores() - 1)
A3 = parSapply(cl,seq_len(10), function(x) sample(1:100,20))
stopCluster(cl)
# Check: Case 1 result equals Case 2 result
identical(A1,A2)
# [1] TRUE
# Check: Case 1 result does NOT equal to Case 3 result
identical(A1,A3)
# [1] FALSE
# Check2: Would like to check if it's a matter of ordering
range(rowSums(A1))
# [1] 319 704
range(rowSums(A3))
# [1] 288 612
在上面的代码中,parSapply 生成了一组与 A1 和 A2 不同的随机数。我拥有 Check2 的目的是,我怀疑 parSapply 可能会改变顺序,但似乎并非如此,因为这些随机数的最大和最小总和不同。
感谢是否有人可以说明为什么 parSapply 会给出与 sapply 不同的结果。我在这里错过了什么?
提前致谢!
看看 ?vignette(parallel)
,尤其是 "Section 6 Random-number generation"。除其他事项外,它还规定了以下内容
Some care is needed with parallel computation using (pseudo-)random numbers: the processes/threads which run separate parts of the computation need to run independent (and preferably reproducible) random-number streams.
When an R process is started up it takes the random-number seed from the object .Random.seed in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help on RNG). Thus worker processes might get the same seed because a workspace containing .Random.seed was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).
您还应该看看 ?clusterSetRNGStream
。