cbind 多个稀疏矩阵时得到 "node stack overflow"

Question

我有 100,000 个稀疏矩阵 ("dgCMatrix") 存储在一个列表对象中。每个矩阵的行数相同（8,000,000），列表的大小约为 25 Gb。现在当我这样做时：

do.call(cbind, theListofMatrices)

将所有矩阵组合成一个大的稀疏矩阵，我得到了"node stack overflow"。实际上，我什至不能只使用该列表中的 500 个元素来执行此操作，它应该输出一个大小仅为 100 Mb 的稀疏矩阵。

我对此的推测是cbind()函数将稀疏矩阵转换为普通密集矩阵从而导致堆栈溢出？

其实我试过这样的：

tmp = do.call(cbind, theListofMatrices[1:400])

这很好，tmp 仍然是一个大小为 95 Mb 的稀疏矩阵，然后我尝试了：

> tmp = do.call(cbind, theListofMatrices[1:410])
Error in stopifnot(0 <= deparse.level, deparse.level <= 2) : 
  node stack overflow

然后就出现了错误。但是，我可以轻松地执行以下操作：

cbind(tmp, tmp, tmp, tmp)

因此，我认为它与 do.call()

有关

Reduce() 似乎解决了我的问题，尽管我仍然不知道 do.call() 崩溃的原因。

Answer 1

在 R 中：一个 2 列矩阵最多可以有 2^30-1 行 = 1073,741,823 行。所以，我会检查行号并检查 RAM 大小以确保它可以容纳大矩阵大小。

Answer 2

问题不在 do.call() 中，而是由于 Matrix 包中 cbind 的实现方式。它使用递归将各个参数绑定在一起。例如，Matrix::cbind(mat1, mat2, mat3) 被翻译成类似 Matrix::cbind(mat1, Matrix::cbind(mat2, mat3)) 的内容。因为 do.call(cbind, theListofMatrices) 基本上是 cbind(theListofMatrices[[1]], theListofMatrices[[2]], ...) 你有太多的参数给 cbind 函数，你最终会得到一个嵌套太深的递归，它会失败。

因此，Ben's comment 使用 Reduce() 是解决该问题的好方法，因为它避免了递归并将其替换为迭代：

tmp <- Reduce(cbind, theListofMatrices[-1], theListofMatrices[[1]])

cbind 多个稀疏矩阵时得到 "node stack overflow"

Getting "node stack overflow" when cbind multiple sparse matrices

r

sparse-matrix