RcppArmadillo 定义的具有相同数据的函数返回的不同结果

Question

设 X 为大小为 r*c*n 的 3 维数组，设 y 为长度为 n 的向量，具有两层。我想计算按 y 分组的 X 的平均矩阵。这里我尝试使用arma::cube来定义函数，但是每次调用函数，返回的结果都不一样，很奇怪。即使对于非常小的（r、c、n），结果也始终包括 NaN。

.cpp文件内容如下：

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::export]]
List f(arma::cube X, CharacterVector y){
    unsigned int n     = X.n_slices;
    unsigned int rNums = X.n_rows;
    unsigned int cNums = X.n_cols;

    arma::mat mu1(rNums, cNums);
    arma::mat mu2(rNums, cNums);

    unsigned int n1 = 0;
    unsigned int n2 = 0;
    CharacterVector yLevels = sort_unique(y);
    for(unsigned int i=0; i < y.length(); i++){
        if(y[i] == yLevels[0]) {
            mu1 += X.slice(i);
            n1++;
        } else {
            mu2 += X.slice(i);
            n2++;
        }
    }
    mu1 /= n1;
    mu2 /= n2;

    return Rcpp::List::create(Named("mu1") = mu1,
                              Named("mu2") = mu2);
}

然后我在R中调用这个.cpp文件，调用R和Cxx版本的函数如下：

> rm(list=ls())
> options(digits=2)
> library(Rcpp)
> sourceCpp("Cxx_File.cpp")
> 
> set.seed(2018)
> X <- array(rnorm(4*5*10), dim=c(4, 5, 10))
> y <- c(rep("1", 4), rep("2", 6))
> 
> f(X, y)
$mu1
      [,1]   [,2]   [,3]  [,4]   [,5]
[1,]  0.33  0.076  0.230  0.43 -0.801
[2,] -0.50 -0.145  0.162 -0.21  0.629
[3,] -0.13  0.516 -0.266 -0.37 -0.261
[4,]  0.73  0.226 -0.071 -0.36  0.035

$mu2
      [,1]   [,2]  [,3] [,4]   [,5]
[1,] 0.098  0.016  0.27 0.70  0.017
[2,] 0.434 -0.164  0.40 0.77  0.104
[3,] 0.060  0.226  0.05  NaN  0.091
[4,] 0.261  0.313 -0.47 0.18 -0.301

结果包含NaN，这不应该出现。这是什么原因？

Answer 1

这与 arma::mat 不使用零初始化矩阵这一事实有关。在初始化这些变量时添加 mu1.fill(0)、mu2.fill(0) 和 sigma2.fill(0)。

RcppArmadillo 定义的具有相同数据的函数返回的不同结果

Different results returned by RcppArmadillo-defined functions with the same data

r

rcpp