在没有全局分配的情况下在函数的每次迭代中动态更新输入数据帧

Question

我有 (1) 评分参考 table，以及 (2) 根据这些评分随机生成结果并根据生成的结果更新评分的函数。

虽然下面的可重现示例有更简单的解决方案，但预期的应用是根据对手的 Elo 等级来模拟对手之间的结果，每轮之后都会更新等级，以便运行模拟 'hot'.

在这里，我有一个评分 ref 的参考 table 并使用函数 genResult 生成随机结果并使用全局分配更新参考 table。

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  # assign('ref', ref, envir=.GlobalEnv)
  ref <<- ref

  return(list(result_i, ref))
}

复制这个函数两次，我们可以看到ref按预期更新了。

replicate(2, genResult(ref), simplify = F)

返回这个，我们可以看到引用 table 在两次迭代中的每一次更新。

[[1]]
[[1]][[1]]
id score
1  A     1

[[1]][[2]]
id rating
1  A    130
2  B    179
3  C    141
4  D    188
5  E    194


[[2]]
[[2]][[1]]
id score
1  C    -2

[[2]][[2]]
id rating
1  A    130
2  B    179
3  C    139
4  D    188
5  E    194

现在假设我想复制上面的（复制的）函数；使用动态更新的评级模拟 5 个结果的 3 个独立实例并仅输出结果。我再次引用 table ref 并定义了一个使用全局赋值的类似函数：

set.seed(123)
ref <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))

genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  ref <<- ref

  return(result_i)
}

然后使用 apply 循环并将结果列表折叠到数据框：

lapply(1:3, function(i) {

  ref_i <- ref

  replicate(5, genResult2(ref_i), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

返回：

id score i
1   A     1 1
2   C    -2 1
3   B     9 1
4   A    26 1
5   A    -9 1
6   D    10 2
7   D     8 2
8   C     5 2
9   A    36 2
10  C    17 2
11  B    14 3
12  B   -15 3
13  B    -4 3
14  A   -22 3
15  B   -13 3

现在这似乎按预期工作了，但是 (i) 感觉真的很丑陋，并且 (ii) 我读过无数次全局赋值可以而且将会造成意外伤害。

谁能提出更好的解决方案？

Answer 1

您可以使用 new.env() 创建一个 新环境 并在那里进行计算：

将这个想法应用到你的第一个函数中得到：

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                  rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1

genResult <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(list(result_i, ref))
}
replicate(2, genResult(refEnv$ref), simplify = F)

ref1
refEnv$ref

您会看到原始的 ref1 没有被触及并保持不变，而 refEnv$ref 包含上次迭代的结果。

并使用 lapply:

将其实现到您的第二个函数

set.seed(123)
ref1 <- data.frame(id = LETTERS[1:5],
                   rating = round(runif(5, 100, 200)))
ref1

refEnv <- new.env()
refEnv$ref = ref1


genResult2 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  assign('ref', ref, envir=refEnv)

  return(result_i)
}

# Replicating this function twice, we can see `ref` is updated as expected.    
lapply(1:3, function(i) {

  replicate(5, genResult2(refEnv$ref), simplify = F) %>% 
    plyr::rbind.fill() %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

ref1

Answer 2

如果您正在迭代并且下一次迭代取决于上一次迭代，这通常是一个好兆头，表明您应该使用老式的 for 循环而不是 replicate 或 apply 函数（另一种可能性本来可以使用 Reduce 并将 accumulate 参数设置为 TRUE).

这给出了与您发布的代码相同的输出，我使用了一个 for 循环并使您的函数也成为 return ref：

genResult3 <- function(ref) {

  id_i <- LETTERS[floor(runif(1, 1, 5))]

  score_i <- round(rnorm(1, 0, 20))

  ref[ref$id == id_i,]$rating <- ref[ref$id == id_i,]$rating + score_i

  result_i <- data.frame(id = id_i, score = score_i)

  #ref <<- ref

  return(list(result_i,ref)) # added ref to output
}

lapply(1:3, function(i) {
  res <- list(5)
  for (k in 1:5){
    gr <- genResult3(ref)
    res[[k]] <- gr[[1]] # update rating
    ref      <- gr[[2]] # get result
    res[[k]] <- left_join(res[[k]], ref, by = "id") # combine for output
  }
    plyr::rbind.fill(res) %>% 
    mutate(i)

}) %>% 
  plyr::rbind.fill()

返回：

   id score rating i
1   A     1    130 1
2   C    -2    139 1
3   B     9    188 1
4   A    26    156 1
5   A    -9    147 1
6   D    10    198 2
7   D     8    206 2
8   C     5    146 2
9   A    36    165 2
10  C    17    163 2
11  B    14    193 3
12  B   -15    178 3
13  B    -4    174 3
14  A   -22    107 3
15  B   -13    161 3

在没有全局分配的情况下在函数的每次迭代中动态更新输入数据帧

Dynamically update input dataframe at each iteration of function without global assignment

loops

r

function

global-variables

variable-assignment