Rcpp 本征稀疏矩阵 cbind

Question

我正在研究 Rcpp (Eigen) 中的算法，该算法需要等效于矩阵的 cbind。我发现 R 的 cbind 非常快，而使用 Eigen 非常慢。我想找到一种优化此功能的方法，以便我可以将我的算法保留在 Rcpp 中。到目前为止，我已经找到了另一个，但它适用于密集矩阵上的 cbind

#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]

using namespace Rcpp;
using namespace Eigen;

// [[Rcpp::export]]
Eigen::SparseMatrix<double> RcppMatrixCbind(Eigen::MappedSparseMatrix<double>& matrix1,
                                            Eigen::MappedSparseMatrix<double>& matrix2) {

  SparseMatrix<double> out(matrix1.rows(), matrix1.cols() + matrix2.cols());
  std::vector<Triplet<double> > tripletList;
  tripletList.reserve(matrix1.nonZeros() + matrix2.nonZeros());
  for (int k = 0; k < matrix1.outerSize(); ++k)
  {
    for (MappedSparseMatrix<double>::InnerIterator it(matrix1, k); it; ++it)
    {
      tripletList.push_back(Triplet<double>(it.row(), it.col(), it.value()));
    }
  }
  for (int k = 0; k < matrix2.outerSize(); ++k)
  {
    for (MappedSparseMatrix<double>::InnerIterator it(matrix2, k); it; ++it)
    {
      tripletList.push_back(Triplet<double>(it.row(), it.col() + matrix1.cols(), it.value()));
    }
  }
  out.setFromTriplets(tripletList.begin(), tripletList.end());
  return out;
}

/*** R
require(Matrix)
x = rsparsematrix(10000, 500, .1)
system.time(foo <- cbind(x,x))
system.time(bar <- RcppMatrixCbind(x, x))
  */

Answer 1

这更像是一个 Eigen 问题：如何扩展稀疏矩阵？

您在解决方案中所做的是逐个元素地执行所有操作，专用的逐块操作 很可能可以击败它。这就是我们在 Matrix 解决方案中所看到的，转向高效代码，大概来自 CHOLMD。

我快速浏览了 Eigen 文档。它警告：

Regarding read-access, sparse matrices expose the same API than for dense matrices to access to sub-matrices such as blocks, columns, and rows. See Block operations for a detailed introduction. However, for performance reasons, writing to a sub-sparse-matrix is much more limited, and currently only contiguous sets of columns (resp. rows) of a column-major (resp. row-major) SparseMatrix are writable. Moreover, this information has to be known at compile-time, leaving out methods such as block(...) and corner*(...).

但我们很幸运，因为 cbind() 足够块状。一个非常简单的解决方案是

// [[Rcpp::export]]
Eigen::SparseMatrix<double>
RcppMatrixCb2(Eigen::MappedSparseMatrix<double>& matrix1,
              Eigen::MappedSparseMatrix<double>& matrix2) {

  SparseMatrix<double> out(matrix1.rows(), matrix1.cols() + matrix2.cols());

  out.leftCols(matrix1.cols()) = matrix1; 
  out.rightCols(matrix2.cols()) = matrix2; 
  return out;
}

击败了之前的两次尝试：

> benchmark(Matrix=cbind(x,x), 
+           prevSol=RcppMatrixCbind(x,x), 
+           newSol=RcppMatrixCb2(x,x),
+           order="relative")[,1:4]
     test replications elapsed relative
3  newSol          100   0.585    1.000
1  Matrix          100   0.686    1.173
2 prevSol          100   4.603    7.868
>

我的完整文件如下。它缺少对两个函数的测试：我们需要确保矩阵一和矩阵二具有相同的行。

// cf 

#include <RcppEigen.h>
// [[Rcpp::depends(RcppEigen)]]

using namespace Rcpp;
using namespace Eigen;

// [[Rcpp::export]]
Eigen::SparseMatrix<double> RcppMatrixCbind(Eigen::MappedSparseMatrix<double>& matrix1,
                                            Eigen::MappedSparseMatrix<double>& matrix2) {

  SparseMatrix<double> out(matrix1.rows(), matrix1.cols() + matrix2.cols());
  std::vector<Triplet<double> > tripletList;
  tripletList.reserve(matrix1.nonZeros() + matrix2.nonZeros());
  for (int k = 0; k < matrix1.outerSize(); ++k) {
    for (MappedSparseMatrix<double>::InnerIterator it(matrix1, k); it; ++it) {
      tripletList.push_back(Triplet<double>(it.row(), it.col(), it.value()));
    }
  }
  for (int k = 0; k < matrix2.outerSize(); ++k) {
    for (MappedSparseMatrix<double>::InnerIterator it(matrix2, k); it; ++it) {
      tripletList.push_back(Triplet<double>(it.row(), it.col() + matrix1.cols(), it.value()));
    }
  }
  out.setFromTriplets(tripletList.begin(), tripletList.end());
  return out;
}

// [[Rcpp::export]]
Eigen::SparseMatrix<double> RcppMatrixCb2(Eigen::MappedSparseMatrix<double>& matrix1,
                                          Eigen::MappedSparseMatrix<double>& matrix2) {

  SparseMatrix<double> out(matrix1.rows(), matrix1.cols() + matrix2.cols());

  out.leftCols(matrix1.cols()) = matrix1; 
  out.rightCols(matrix2.cols()) = matrix2; 
  return out;
}


/*** R
require(Matrix)
set.seed(42)
x = rsparsematrix(10000, 500, .1)
library(rbenchmark)
benchmark(Matrix=cbind(x,x), 
          prevSol=RcppMatrixCbind(x,x), 
          newSol=RcppMatrixCb2(x,x),
          order="relative")[,1:4]
*/

Rcpp 本征稀疏矩阵 cbind

Rcpp Eigen sparse matrix cbind

r

matrix

rcpp

eigen