什么可以替代 R 中的嵌套循环

Question

我想通过运行给定两个变量 x 和 y 的多个场景，从 R 中的数据帧 input 创建数据帧 output。 output 列是 value 列中所有值的总和，其中 xcol < x & ycol < y.

input =
xcol ycol value
1   5   4
2   6   9
3   7   8
4   9   7
5   14  8

和

output= 
x   y   results
2   5   0
2   10  4
2   15  35
...
6   5   0
6   10  27
6   15  35

我的代码目前是这样的：

for (x in 2:6) {
  if (x%% 2){
    next
  }
  for (y in 5:15) {
    if (y %% 5){
      next
    }
    print(x)
    print(y)
    print(sum(input$value[!is.na(input$xcol) & !is.na(input$ycol) & !is.na(input$value) & 
              input$xcol < x &  input$ycol < y]))
  }
}

应该有更好的方法来替换这个嵌套循环，使用 lapply & sapply 并创建我认为的数据框。如果有任何帮助，我将不胜感激。

谢谢

Answer 1

这不是最优雅的解决方案，但您可以将 for 循环替换为 lapply:

lapply (2:6, function(x) {
  if (x%% 2){
    next
  }
  lapply (5:15, function(y) {
    if (y %% 5){
      next
    }
    print(x)
    print(y)
    print(sum(input$value[!is.na(input$xcol) & !is.na(input$ycol) & !is.na(input$value) & 
              input$xcol < x &  input$ycol < y]))
  })
})

Answer 2

从某种意义上说，这似乎更像是一种实验设计，您在其中迭代 x 和 y.

的不同可能值

xs <- 2:6
ys <- 5:15
eg <- expand.grid(x = xs, y = ys)
head(eg)
#   x y
# 1 2 5
# 2 3 5
# 3 4 5
# 4 5 5
# 5 6 5
# 6 2 6

我认为你的 %% 过滤应该完成 outside/before 这个，所以:

xs <- xs[!xs %% 2]
ys <- ys[!ys %% 5]
eg <- expand.grid(x = xs, y = ys)
head(eg)
#   x  y
# 1 2  5
# 2 4  5
# 3 6  5
# 4 2 10
# 5 4 10
# 6 6 10

从这里，您可以迭代行：

eg$out <- sapply(seq_len(nrow(eg)), function(r) {
  sum(input$value[ complete.cases(input) & input$xcol < eg$x[r] & input$ycol < eg$y[r] ])
})
eg
#   x  y out
# 1 2  5   0
# 2 4  5   0
# 3 6  5   0
# 4 2 10   4
# 5 4 10  21
# 6 6 10  28
# 7 2 15   4
# 8 4 15  21
# 9 6 15  36

我认为您的 output 变量有点偏差，因为“2,15”应该只包含 input$value[1]（x < 2 是限制因素）。（存在其他差异。）

无论您的实际索引逻辑如何，我都建议在双 for 或双 lapply 实现上使用此方法。

注意：

这些命令在功能上等同于 input:
```
complete.cases(input)                                         # 1
complete.cases(input[c("xcol","ycol","value")])               # 2
!is.na(input$xcol) & !is.na(input$xcol) & !is.na(input$value) # 3
```
我从 "code golf" 开始使用第一个，但如果您的实际 input data.frame 包含其他列，您可能更喜欢第二个，因为它可以更好地选择哪些列需要非 NA 值。
expand.grid 非常适合这种类型的扩展。但是，如果您正在查看明显更大的数据集（包括如果您的过滤比 %% 提供的更复杂），那么它可能会有点贵，因为它必须在内存中创建整个 data.frame。 Python 对惰性迭代器的使用在这里很有用，在这种情况下您可能更愿意使用 (expanded function in a github gist with some docs: https://gist.github.com/r2evans/e5531cbab8cf421d14ed).

什么可以替代 R 中的嵌套循环

What can substitute a nested loop in R

r

nested-loops

lapply