使用 sample() 从 R 中的嵌套列表中采样

Question

我正在寻找一种方法来使用 sample() 根据 data.table 的另一列中的值对不同列表中的值进行采样 - 目前我遇到递归索引失败错误- 下面的代码和更多解释：

首先设置示例数据：

library(stats)
library(data.table)

# list of three different nest survival rates
survival<-list(0.91,0.95,0.99)

# incubation period
inc.period<-28

# then set up function to use the geometric distribution to generate 3 lists of incubation outcomes based on the nest survivals and incubation period above.
# e.g. less than 28 is a nest failure, 28 is a successful nest.

create.sample <- function(survival){
  outcome<-rgeom(100,1-survival)
  fifelse(outcome > inc.period, inc.period, outcome)
}

# then create list of 100 nest outcomes with 3 different survival values using lapply 

inc.outcomes <- lapply(survival,create.sample)

# set up a data.table - each row of data will be a nest.

index<-c(1:3)
iteration<-1:20
dt = CJ(index,iteration)

然后我想创建一个新列 'inc.period'，它使用 dt 的索引列从 'inc.outcomes' 列表中采样到 select 三个 'inc.outcomes' 中的哪一个要从中抽样的列表（每行数据有不同的样本）。所以例如当 index = 1 时，采样值来自 inc.outcomes[[1]] - 这是低巢生存列表，当 index = 2 我从 inc.outcomes[[2]] 等采样

代码看起来像这样，但这不起作用（我收到递归索引失败错误）：

dt[,inc.period:= sample(inc.outcomes[[index]],nrow(dt),replace = TRUE)]

非常感谢收到的任何帮助或建议，以及针对此问题的不同方法的建议 - 这是为了更新在闪亮模拟中运行的代码，因此首选更快的选项！

Answer 1

两个问题：

inc.outcomes[[index]] 是一个问题，因为 index 在这里是 60 长，这意味着您最终要尝试 inc.outcomes[[ c(1,1,...,2,2,...,3,3) ]]，这是不正确的。 [[-indexing 要么是长度为 1（对于大多数用途），要么是一个向量，只要它的列表是 nested。例如，在 list(list(1,2),list(3,4))[[ c(1,2) ]] 中，长度为 2 的 [[c(1,2)]] 工作 因为它们有 2 层嵌套列表。由于 inc.outcomes 只有 1-deep，我们在 [[ 索引中只能有 length-1。
这意味着我们需要通过-index来完成。（由此，我们需要将 nrow(dt) 更改为 .N，但坦率地说，即使没有 by=，我们也应该使用它。）

dt[, inc.period := sample(inc.outcomes[[ index[1] ]], .N, replace = TRUE), by = index]
#     index iteration inc.period
#     <int>     <int>      <num>
#  1:     1         1         17
#  2:     1         2         17
#  3:     1         3         21
#  4:     1         4         24
#  5:     1         5          3
#  6:     1         6          1
#  7:     1         7         17
#  8:     1         8          0
#  9:     1         9          1
# 10:     1        10          0
# ---                           
# 51:     3        11          0
# 52:     3        12          0
# 53:     3        13         28
# 54:     3        14         28
# 55:     3        15          9
# 56:     3        16         28
# 57:     3        17          7
# 58:     3        18         28
# 59:     3        19         28
# 60:     3        20         28

我的数据：

dt <- setDT(structure(list(index = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), iteration = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,  11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L)), row.names = c(NA, -60L), class = c("data.table", "data.frame"), sorted = c("index", "iteration")))

使用 sample() 从 R 中的嵌套列表中采样

Using sample() to sample from nested lists in R

random

r

data.table