使用 rbinom 的 dplyr 变异不 return 随机数

Question

我想使用 mutate 来计算使用二项分布的列。

我有以下例子：

library("dplyr")

d = data.frame(ref = rbinom(100,100,0.5))
d$coverage = 100
d$prob = 0.5
d$eprob= d$ref / d$coverage
d = tbl_df(d)

mutate(d,
       ref1= ref,
       cov1 = coverage,
       eprob1 = eprob,
       ref2=rbinom(1, coverage, eprob),
       ref3=rbinom(1, cov1, eprob1)
       )

结果是这样的：

Source: local data frame [100 x 9]

   ref coverage prob eprob ref1 cov1 eprob1 ref2 ref3
1   52      100  0.5  0.52   52  100   0.52   45   44
2   50      100  0.5  0.50   50  100   0.50   45   44
3   45      100  0.5  0.45   45  100   0.45   45   44
4   45      100  0.5  0.45   45  100   0.45   45   44
5   47      100  0.5  0.47   47  100   0.47   45   44
6   46      100  0.5  0.46   46  100   0.46   45   44
7   50      100  0.5  0.50   50  100   0.50   45   44
8   53      100  0.5  0.53   53  100   0.53   45   44
9   44      100  0.5  0.44   44  100   0.44   45   44
10  56      100  0.5  0.56   56  100   0.56   45   44

我不明白 - 我希望 mutate 函数 return 从 ref 和 coverage ("ref2") 给出的二项分布中抽取的随机数...

Mutate 正确读取了列 - 但在调用 rbinom 时发生了一些奇怪的事情...

感谢任何帮助。

Answer 1

尝试更改 rbinom 的 n：

mutate(d,
   ref1= ref,
   cov1 = coverage,
   eprob1 = eprob,
   ref2=rbinom(100, coverage, eprob),
   ref3=rbinom(100, cov1, eprob1)
)

或者更一般地说：

mutate(d,
   ref1= ref,
   cov1 = coverage,
   eprob1 = eprob,
   ref2=rbinom(n(), coverage, eprob),
   ref3=rbinom(n(), cov1, eprob1)
)

Answer 2

另一个解决方案是：

d %>% rowwise() %>%
      mutate(ref1= ref,
             cov1 = coverage,
             eprob1 = eprob,
             ref2=rbinom(1, coverage, eprob),
             ref3=rbinom(1, cov1, eprob1))

其中 rowwise() 命令按（每）行分组并指定每行需要 1 个随机值。

使用 rbinom 的 dplyr 变异不 return 随机数

dplyr mutate using rbinom do not return random numbers

r

dplyr