当已知结果是对称的时加速矩阵乘法

Question

我知道矩阵乘法的结果是对称的。是否有 R 包或一些标准方法可以通过仅计算 lower/upper 半三角形然后将结果复制到另一半来加快我的计算。

我知道当只提供一个参数但我想提供两个矩阵时 tcrossprod 从这一事实中受益。

这是一个结果对称的例子：

n <- 100
m <- 200
s<-matrix(runif(n^2),n,n)
s[lower.tri(s)] <- t(s)[lower.tri(s)]
x <- matrix(runif(m*n), m, n)
x %*% s %*% t(x)

tcrossprod 似乎不是解决方案：

library(microbenchmark)
microbenchmark(x %*% s %*% t(x), tcrossprod(x %*% s, x))

我曾尝试使用 Rcpp，即使没有复制步骤，这也比 R 的乘法慢（尽管我承认我是初学者 c++/Rcpp 用户）：

w <- s %*% t(x)
mm = Rcpp::cppFunction(
'NumericMatrix mmult(NumericMatrix m , NumericMatrix v)
{
  NumericMatrix out(m.nrow(), v.ncol());

  for (int i = 0; i < m.nrow(); i++)
  {
    for (int j = 0; j < i + 1; j++)
    {
      for(int k = 0; k < m.ncol(); k++){
        out(i,j) += m(i,k) * v(k,j) ;
      }
    }
  }
  return out;
}'
)

microbenchmark(mm(x, w), x %*% w)

我认为，如果 .Internal function do_matprod 中的 sym 变量被公开并且可以由用户设置为 true，这将得到解决。但是，我真的不想惹那些事...

Answer 1

matrix 包似乎没有利用对称性：

> n <- 100
> x <- s <- matrix(runif(n^2),n,n)
> s[lower.tri(s)] <- t(s)[lower.tri(s)]
> 
> library(Matrix)
> s_sym <- Matrix(forceSymmetric(s))
> class(s_sym) # has the symmetric class
[1] "dsyMatrix"
attr(,"package")
[1] "Matrix"
> 
> library(microbenchmark)
> microbenchmark(x %*% x, s %*% s, s_sym %*% s_sym)
Unit: microseconds
            expr min  lq mean median  uq  max neval
         x %*% x 461 496  571    528 625 1008   100
         s %*% s 461 499  560    532 572  986   100
 s_sym %*% s_sym 553 568  667    624 701 1117   100

帮助文件中没有任何指示：

The basic matrix product, %*% is implemented for all our Matrix and also for sparseVector classes, fully analogously to R’s base matrix and vector objects. The functions crossprod and tcrossprod are matrix products or “cross products”, ideally implemented efficiently without computing t(.)’s unnecessarily. They also return symmetricMatrix classed matrices when easily detectable, e.g., in crossprod(m), the one argument case. tcrossprod() takes the cross-product of the transpose of a matrix. tcrossprod(x) is formally equivalent to, but faster than, the call x %*% t(x), and so is tcrossprod(x, y) instead of x %*% t(y).

您的一个解决方案是使用 Rcpp 和 R_ext/BLAS.h 中可用的 BLAS 函数制作包装函数。您可以按如下方式执行此操作：制作一个像这样的 func.cpp：

// added to get $(BLAS_LIBS) in compile flags
//[[Rcpp::depends(RcppArmadillo)]]
#include <Rcpp.h>
#include <R_ext/BLAS.h>

/*
  Wrapper for BLAS dsymm. See dsymm http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_ga253c8edb8b21d1b5b1783725c2a6b692.html#ga253c8edb8b21d1b5b1783725c2a6b692
  Only works with side = 'R'
  Note intput is by refernce with &
*/
// [[Rcpp::export]]
Rcpp::NumericMatrix blas_dsymm(
    char uplo, int m, int n, double alpha,
    const Rcpp::NumericMatrix &A, const Rcpp::NumericMatrix &B){
  // set lda, ldb and ldc
  int lda = n, ldb = m, ldc = m;

  // make new matrix with dim(m, n)
  Rcpp::NumericMatrix C(m, n); // default values are zero
  double beta = 0;

  F77_NAME(dsymm)(
    "R" /* side */, &uplo, &m, &n, &alpha, 
    A.begin(), &lda, B.begin(), &ldb, &beta, C.begin(), &ldc);

  return(C);
}

然后运行以下R脚本：

> n <- 100
> m <- 200
> s<-matrix(runif(n^2),n,n)
> s[lower.tri(s)] <- t(s)[lower.tri(s)]
> x <- matrix(runif(m*n), m, n)
> 
> library("Rcpp")
> sourceCpp("func.cpp")
> 
> out <- x %*% s
> out_blas <- blas_dsymm(
+   uplo = "U", m = nrow(x), n = ncol(x), 
+   alpha = 1, A = s, B = x)
> 
> all.equal(out, out_blas)
[1] TRUE
> 
> library(microbenchmark)
> microbenchmark(
+   dense = x %*% s,
+   BLAS = blas_dsymm(
+     uplo = "U", m = nrow(x), n = ncol(x), 
+     alpha = 1, A = s, B = x))
Unit: microseconds
  expr     min       lq     mean   median       uq      max neval
 dense 880.989 950.3225 1114.744 1066.866 1159.311 2783.213   100
  BLAS 858.866 938.6680 1169.839 1016.495 1225.286 3261.633   100

这里好像没什么区别。请注意，您需要安装 RcppArmadillo 和 Rcpp 软件包。

Answer 2

不要使用 for 循环重新编码矩阵乘法。线性代数库为此进行了高度优化，你可能会慢 10 倍（或更糟）。

对于矩阵计算，使用 RcppArmadillo 或 RcppEigen 不会获得太多（或松动）。

如果您想获得收益，可以更改您正在使用的数学库，例如将 MKL 与 Microsoft R Open 一起使用。

当已知结果是对称的时加速矩阵乘法

Speed up matrix multiplication when result is known to be symmetric

r

rcpp