在 rcpp 中提取 arma::mat 矩阵的每个第 k 列

Question

我正在为 class arma::mat 矩阵的子集列而苦苦挣扎。

假设给定 arma::mat X，我试图创建一个索引向量 IDX，以便执行 X.cols(IDX)。特别是，索引向量具有从 1 到 p 的每个第 k 个整数（X 的维度）。例如，一个人可能对每个偶数列感兴趣 IDX=[2,4,6,8, ...].

基于 this documentation，如果 m <= p，可以使用 X.cols(0, m - 1) 轻松提取 [0, 1, 2, ..., m-1] 等连续索引。但是，我找不到使用上述索引向量 IDX 对矩阵进行子集化的好方法。

我想知道如何完成此代码以提供所需的输出。

我的 "subset_armamat.cpp" 文件看起来像

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
mat subset_armamat(mat X, int k){
  uvec IDX = "every k-th integer from 0 to X.ncols";
  return X.cols(IDX);
}

执行定义函数的 R 代码是

library("Rcpp")
sourceCpp("subset_armamat.cpp")
subset_armamat(matrix(1:10, 2, 5, byrow = T), 2)

这预计会产生一个 2×3 矩阵，因为以下 R 代码会给出

> matrix(1:10, 2, 5, byrow = T)[,seq(1, 5, by = 2)]
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    6    8   10

如果您提供任何意见，我们将不胜感激。

p.s。我试过

手动生成序列向量 seq(1,m) * 2，但这不适用于 X.cols()。
或使用 find(seq(1,p) % 2 == 0) 查找索引，但 % 运算符在 seq(1,p) 和 2 之间效果不佳。

Answer 1

这个有效：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
mat subset_armamat(mat X, int k) {

  // Obtain environment containing function
  Rcpp::Environment base("package:base"); 

  // Make function callable from C++
  Rcpp::Function seq = base["seq"];    

  uvec IDX = as<uvec>(seq(0, X.n_cols, k));
  return X.cols(IDX);
}

我只是从 Rcpp 调用 R 函数 base::seq()。

Answer 2

showed that you can in fact use a uvec to subset a matrix using .cols() even if its not a contiguous range, using the base R seq() function to generate the sequence. I will further demonstrate that you can generate the sequence using an Armadillo function; you can use arma::regspace() -- it "generate[s] a vector with regularly spaced elements" (Armadillo documentation source):

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
mat subset_armamat(mat X, int k) {
    uvec IDX = regspace<uvec>(0,  k,  X.n_cols-1);
    return X.cols(IDX);
}

作为与调用 R 的 seq() 的比较（其中 subset_armamatR() 是 F.Privé 的回答中的函数）：

library("Rcpp")
sourceCpp("subset_armamat.cpp")
mat <- matrix(1:10, 2, 5, byrow = TRUE)
subset_armamat(mat, 2)
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    6    8   10
subset_armamatR(mat, 2)
#>      [,1] [,2] [,3]
#> [1,]    1    3    5
#> [2,]    6    8   10
library(microbenchmark)
microbenchmark(Rseq = subset_armamatR(mat, 2),
               regspace = subset_armamat(mat, 2))
#> Unit: microseconds
#>      expr     min       lq     mean   median       uq       max neval cld
#>      Rseq 235.535 239.1615 291.1954 241.9850 248.6005  4704.467   100   a
#>  regspace  14.221  15.0225 520.9235  15.8165  16.6740 50408.375   100   a

更新：通过引用传递

来自 hbrerkere warrants some brief additional discussion. If you are calling this function from C++, you'll gain speed by changing mat subset_armamat(mat X, int k) to mat subset_armamat(const mat& X, int k). Passing by reference like this avoids an unnecessary copy, and when you do not intend to change an object passed by reference, you should use const. However, if you are calling this function from R, you cannot avoid a copy as arma::mat is not a native R type (see, for example, by Dirk Eddelbuettel (the maintainer of both Rcpp and RcppArmadillo 的评论）。考虑以下示例：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
void reference_example(arma::mat& X) {
    X(0, 0) = 42;
}

// [[Rcpp::export]]
void print_reference_example(arma::mat X) {
    reference_example(X);
    Rcpp::Rcout << X << "\n";
}

然后从 R 调用：

library("Rcpp")
sourceCpp("reference_example.cpp")
mat <- matrix(1:4, 2, 2)
mat
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
reference_example(mat)
mat
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4
print_reference_example(mat)
#>    42.0000    3.0000
#>     2.0000    4.0000
mat
#>      [,1] [,2]
#> [1,]    1    3
#> [2,]    2    4

在 rcpp 中提取 arma::mat 矩阵的每个第 k 列

Extract every k-th column of arma::mat matrix in rcpp

r

armadillo

rcpp

更新：通过引用传递