R中两个改组代码的相等性

Question

我想知道以下两个 4 个数字的洗牌 (1:4) 是否同样随机，或者就随机性而言，一个可能优于另一个：

sample(rep(1:4, 10))

replicate(10, sample(1:4))

限制条件：

尽管是随机的，但我需要有相同数量的 1、2、3 和 4。

Answer 1

看清楚你在注释中的意思后，这两行代码在随机性上基本相等，因为你会得到每个类别的10个实例（从1到4）。

完成任务的时间基本相同，因为总共只有40个数字。

然而，sample(rep(1:4, 10)) returns 一个长度为 40 的整数向量，而 replicate(10, sample(1:4)) 输出一个 4x10 矩阵，其中从 1 到 4 的数字在每一列中恰好绘制一次。

Answer 2

这些函数在任何方面都不相等。

1。输入

f1() outputs a vector, f2() outputs a matrix.

正如@RicS所说，第一个returns一个向量，第二个returns一个矩阵。

2。时间

f1() is almost 50x faster than f2().

运行时的差异 在更大范围内变得更加清晰：

set.seed(1701)

# Functions
f1 <- function() { sample(rep(1:4, 10000)) }
f2 <- function() { c(replicate(10000, sample(1:4))) }

# Benchmark
microbenchmark::microbenchmark(f1(), f2())
Unit: microseconds
 expr      min         lq       mean     median        uq       max neval cld
 f1()   671.28   820.6755   983.9417   988.7985  1046.476  2320.425   100  a 
 f2() 40588.03 43241.0270 48796.0141 45612.0740 54431.890 71117.415   100   b

我们看到 f1() 明显更快，正如@JosephClarkMcIntyre 在评论中所说的那样。

但它们是否至少在 随机性 方面相等？ 让我们测试一下！

3。随机性

f2() is not random.

Bartels 等级测试可以测试一系列数字的随机性和非随机性。

> randtests::bartels.rank.test(as.numeric(f1_result$value))

    Bartels Ratio Test

data:  as.numeric(f1_result$value)
statistic = -1.26, n = 40000, p-value = 0.2077
alternative hypothesis: nonrandomness

p 值 > 0.05，因此未拒绝原假设。
f1() 的结果不是非随机的。（这与确定它是随机的不同）

> randtests::bartels.rank.test(as.numeric(f2_result$value))

    Bartels Ratio Test

data:  as.numeric(f2_result$value)
statistic = 50.017, n = 40000, p-value < 2.2e-16
alternative hypothesis: nonrandomness

p 值 < 0.05，因此原假设被拒绝。
f1() 的结果是非随机的。

如果您查看函数本身的结果，这一点也很明显。

> set.seed(1701)
> replicate(10, sample(1:4))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    4    1    3    3    2    3    3    4     1
[2,]    3    1    2    1    4    3    2    2    3     4
[3,]    4    2    3    2    1    1    4    4    2     2
[4,]    2    3    4    4    2    4    1    1    1     3

它生成一个包含十列的矩阵，每列包含正好个数字1:4。这是不是随机的。

R中两个改组代码的相等性

Equality of two shuffling codes in R

random

r

sample

sampling

限制条件：

1。输入

2。时间

3。随机性