使用 parSapply 生成随机数

Question

我正在尝试运行一个函数，该函数中有一个随机数生成器。结果与我预期的不一样，所以我做了以下测试：

# Case 1
set.seed(100)
A1 = matrix(NA,20,10)
for (i in 1:10) {
  A1[,i] = sample(1:100,20)
}

# Case 2
set.seed(100)
A2 = sapply(seq_len(10),function(x) sample(1:100,20))

# Case 3
require(parallel)
set.seed(100)
cl <- makeCluster(detectCores() - 1)
A3 = parSapply(cl,seq_len(10), function(x) sample(1:100,20))
stopCluster(cl)

# Check: Case 1 result equals Case 2 result
identical(A1,A2)
# [1] TRUE

# Check: Case 1 result does NOT equal to Case 3 result
identical(A1,A3)
# [1] FALSE

# Check2: Would like to check if it's a matter of ordering
range(rowSums(A1))
# [1] 319 704

range(rowSums(A3))
# [1] 288 612

在上面的代码中，parSapply 生成了一组与 A1 和 A2 不同的随机数。我拥有 Check2 的目的是，我怀疑 parSapply 可能会改变顺序，但似乎并非如此，因为这些随机数的最大和最小总和不同。

感谢是否有人可以说明为什么 parSapply 会给出与 sapply 不同的结果。我在这里错过了什么？

提前致谢！

Answer 1

看看 ?vignette(parallel)，尤其是 "Section 6 Random-number generation"。除其他事项外，它还规定了以下内容

Some care is needed with parallel computation using (pseudo-)random numbers: the processes/threads which run separate parts of the computation need to run independent (and preferably reproducible) random-number streams.

When an R process is started up it takes the random-number seed from the object .Random.seed in a saved workspace or constructs one from the clock time and process ID when random-number generation is first used (see the help on RNG). Thus worker processes might get the same seed because a workspace containing .Random.seed was restored or the random number generator has been used before forking: otherwise these get a non-reproducible seed (but with very high probability a different seed for each worker).

您还应该看看 ?clusterSetRNGStream。

使用 parSapply 生成随机数

Using parSapply to generate random numbers

random

parallel-processing

r