有没有办法在这个 R 代码中进行并行处理？

Question

我正在尝试学习 R 中的并行性。我编写了一个代码，其中有一个 50*50 的矩阵，其值从 1 到 250000。对于矩阵中的每个元素，我都在寻找其具有最低值的邻居价值。邻居也可以处于对角线位置。然后我用最低的邻居替换元素本身。运行此代码在我的计算机上花费的时间约为 4.5 秒。如果可以的话，谁能帮我使 for 循环并行？这是代码片段

start_time <- Sys.time()


myMatrix <- matrix(1:250000, nrow=500) # a 500 * 500 matrix from 1 to 250000


indexBound <- function(row,col) { # this function is to check if the indexes are out of bound
  if(row<0 || col <0 || row > 500 || col >500){
    return (FALSE)
  }
  else{
    return (TRUE)
  }
}


for(row in 1:nrow(myMatrix)){
  
  for(col in 1:ncol(myMatrix)){
    li <- list()
    if(indexBound(row-1,col-1)){
      li <- c(li,myMatrix[row-1,col-1])
     
    }
    if(indexBound(row-1,col)){
      li <- c(li,myMatrix[row-1,col])
     
    }
    if(indexBound(row-1,col+1)){
      li <- c(li,myMatrix[row-1,col+1])
      
    }
    if(indexBound(row,col-1)){
      li <- c(li,myMatrix[row,col-1])
    }
    if(indexBound(row-1,col+1)){
      li <- c(li,myMatrix[row,col+1])
      
    }
    if(indexBound(row+1,col-1)){
      li <- c(li,myMatrix[row+1,col-1])
      
    }
    if(indexBound(row+1,col)){
      li <- c(li,myMatrix[row+1,col])
    
    }
    if(indexBound(row+1,col+1)){
      li <- c(li, myMatrix[row+1,col+1])
     
    }
    min = Reduce(min,li) #find the lowest value from the list
    myMatrix[row,col] = min
  }
}
end_time <- Sys.time()

end_time - start_time

感谢您的回复。

Answer 1

您的脚本将生成一个所有元素都等于 2 的矩阵。如果这不是本意，您应该创建一个 myMatrix 的副本，以便在构建 li 时使用（在if 个陈述）。

我意识到这可能是探索并行化的人为示例，但对于 R，通常最好首先关注矢量化。向量化时，此操作可能足够快，以至于并行化实际上可能由于开销而变慢。例如，这是一个使用填充矩阵的矢量化解决方案（这不会给出所有 2，并且它仍然不包括 min 计算中的当前单元格）：

library(matrixStats)

system.time({
  idxShift <- expand.grid(rep(list(-1:1), 2))[-5,] # don't include the current cell (0, 0)
  myMatrix <- matrix(nrow = 502, ncol = 502)
  myMatrix[2:501, 2:501] <- matrix(1:250000, nrow = 500)
  myMatrix <- matrix(rowMins(mapply(function(i,j) c(myMatrix[2:501 + i, 2:501 + j]), idxShift$Var1, idxShift$Var2), na.rm = TRUE), nrow = 500)
})

   user  system elapsed 
   0.03    0.00    0.03

将其与使用 future.apply 的相同矢量化代码的并行版本进行比较：

library(future.apply)
plan(multisession)

system.time({
  idxShift <- expand.grid(rep(list(-1:1), 2))[-5,]
  myMatrix <- matrix(nrow = 502, ncol = 502)
  myMatrix[2:501, 2:501] <- matrix(1:250000, nrow = 500)
  myMatrix <- matrix(rowMins(future_mapply(function(i,j) c(myMatrix[2:501 + i, 2:501 + j]), idxShift$Var1, idxShift$Var2), na.rm = TRUE), nrow = 500)
})

future:::ClusterRegistry("stop")

   user  system elapsed 
   0.10    0.05    2.11

如果我没有搞砸什么，并行解决方案会更慢，即使不包括 plan(multisession) 的时间也是如此。

有没有办法在这个 R 代码中进行并行处理？

Is there a way to do parallelism in this R Code?

parallel-processing

multithreading

for-loop

r

matrix