比较 RcppArmadillo 和 R 运行非连续子矩阵选择的速度

Question

我的主要目标是 select 非连续子矩阵，使用两组二进制向量作为行和列。这是我使用 Rcpp、RcppArmadillo 和 RcppEigen 在 C++ 中实现的 MCMC 循环需要执行的许多步骤之一。

实现此目的的三种可能方法是 (1) 使用 RcppArmadillo，(2) 从 Rcpp 调用我的 R 函数，以及 (3) 直接使用 R 并将结果传递给 C++。虽然最后一个选项对我来说一点都不方便

然后我比较了这三种场景的性能速度。有趣的是，直接R代码比其他两个快多了！更令我惊讶的是，当我从 Rcpp 调用确切的 R 函数时，它比我直接从 R 调用它时慢得多。我希望它们具有相对相同的运行速度，如中的示例所建议的那样这个older post.

无论如何，计时结果对我来说似乎有点奇怪。对原因有何评论？我使用 Macbook Pro 和 El Capitan OS，2.5 Ghz Intel Core i7。可能与我的系统有关，Mac OSX 或者 Rcpp 在我的机器上的安装方式？

提前致谢！

代码如下：

CPP 部分：

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;

// (1) Using RcppArmadillo functions:
// [[Rcpp::export]]
mat subselect(NumericMatrix X, uvec rows, uvec cols){

  mat XX(X.begin(), X.nrow(),X.ncol(), false);
  mat y = XX.submat(find(rows>0),find(cols>0));
  return (y);
}

// (2) Calling the function from R:
// [[Rcpp::export]]
NumericalMatrix subselect2(NumericMatrix X, NumericVector rows, NumericVector cols){ 

  Environment stats;
  Function submat = stats["submat"];
  NumericMatrix outmat=submat(X,rows,cols);
  return(wrap(outmat));
}

R 段:

library(microbenchmark)

# (3) My R function:
submat <- function(mat,rvec,cvec){
 return(mat[as.logical(rvec),as.logical(cvec)])
}

# Comparing the performances:

// Generating data:
set.seed(432)
rows <- rbinom(1000,1,0.1)
cols <- rbinom(1000,1,0.1)
amat <- matrix(1:1e06,1000,1000)

//benchmarking:
microbenchmark(subselect(amat,rows,cols),
           subselect2(amat,rows,cols),
           submat(amat,rows,cols))

结果：

                         expr     min       lq     mean    median       uq        max   neval   

  subselect(amat, rows, cols) 893.670 1566.882 2297.991  1675.282 2184.783   8462.142     100
 subselect2(amat, rows, cols) 928.418 1581.553 3554.805  1657.454 2060.837 138801.050     100
     submat(amat, rows, cols)  36.313   55.748   66.782    62.709   73.975    136.970     100

Answer 1

这里有几件事值得一提。首先，您在基准测试设计中犯了一个微妙的错误，该错误对 Armadillo 函数 subselect 的性能产生了重大影响。观察：

set.seed(432)
rows <- rbinom(1000, 1, 0.1)
cols <- rbinom(1000, 1, 0.1)

imat <- matrix(1:1e6, 1000, 1000)
nmat <- imat + 0.0

storage.mode(imat)
# [1] "integer"

storage.mode(nmat)
# [1] "double"

microbenchmark(
    "imat" = subselect(imat, rows, cols),
    "nmat" = subselect(nmat, rows, cols)
)
# Unit: microseconds
#  expr      min       lq      mean    median        uq       max neval
#  imat 3088.140 3218.013 4355.2956 3404.4685 4585.1095 21662.540   100
#  nmat  139.298  167.116  223.2271  209.4585  238.6875   533.035   100

尽管 R 经常将整数文字（例如 1、2、3、...）视为浮点值，但序列运算符 : 是少数例外之一，

storage.mode(c(1, 2, 3, 4, 5))
# [1] "double"

storage.mode(1:5)
# [1] "integer"

这就是为什么表达式 matrix(1:1e6, 1000, 1000) returns 是 integer 矩阵，而不是 numeric 矩阵。这是有问题的，因为 subselect 期望 NumericMatrix，而不是 IntegerMatrix，并且传递给它后一种类型会触发深层复制，因此上面的差异超过一个数量级基准。

其次，R 函数 submat 和 C++ 函数 subselect 在二进制索引向量分布方面的相对性能存在显着差异，这可能是由于差异在底层算法中。对于更稀疏的索引（0 比 1 的比例更大），R 函数胜出；而对于更密集的索引，情况正好相反。这似乎也是矩阵大小（或可能只是维度）的函数，如下图所示，其中行和列索引向量是使用 rbinom 生成的，成功参数为 0.0、0.05、0.10，... , 0.95, 1.0 -- 首先是 1e3 x 1e3 矩阵，然后是 1e3 x 1e4 矩阵。这个代码包含在最后。

基准代码：

library(data.table)
library(microbenchmark)
library(ggplot2)

test_data <- function(nr, nc, p, seed = 123) {
    set.seed(seed)
    list(
        x = matrix(rnorm(nr * nc), nr, nc),
        rv = rbinom(nr, 1, p),
        cv = rbinom(nc, 1, p)
    )
}

tests <- lapply(seq(0, 1, 0.05), function(p) {
    lst <- test_data(1e3, 1e3, p)
    list(
        p = p,
        benchmark = microbenchmark::microbenchmark(
            R = submat(lst[[1]], lst[[2]], lst[[3]]),
            Arma = subselect(lst[[1]], lst[[2]], lst[[3]])
        )
    )
})

gt <- rbindlist(
    Map(function(g) {
        data.table(g[[2]])[
            ,.(Median.us = median(time / 1000)), 
            by = .(Expr = expr)
        ][order(Median.us)][
            ,Relative := Median.us / min(Median.us)
        ][,pSuccess := sprintf("%3.2f", g[[1]])]
    }, tests)
)

ggplot(gt) +
    geom_point(
        aes(
            x = pSuccess, 
            y = Relative, 
            color = Expr
        ),
        size = 2,
        alpha = 0.75
    ) +
    theme_bw() +
    ggtitle("1e3 x 1e3 Matrix")

## change `test_data(1e3, 1e3, p)` to
## `test_data(1e3, 1e4, p)` inside of 
## `tests <- lapply(...) ...` to generate 
## the second plot

比较 RcppArmadillo 和 R 运行非连续子矩阵选择的速度

Comparing RcppArmadillo and R running speed for non-contiguous submatrix selection

performance

r

rcpp

比较 RcppArmadillo 和 R 运行 非连续子矩阵选择的速度

Comparing RcppArmadillo and R running speed for non-contiguous submatrix selection

performance

r

rcpp

比较 RcppArmadillo 和 R 运行非连续子矩阵选择的速度