我应该更喜欢 Rcpp::NumericVector 而不是 std::vector？

Question

为什么我更喜欢 Rcpp::NumericVector 而不是 std::vector<double>？

比如下面的两个函数

// [[Rcpp::export]]
Rcpp::NumericVector foo(const Rcpp::NumericVector& x) {
  Rcpp::NumericVector tmp(x.length());
  for (int i = 0; i < x.length(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

// [[Rcpp::export]]
std::vector<double> bar(const std::vector<double>& x) {
  std::vector<double> tmp(x.size());
  for (int i = 0; i < x.size(); i++)
    tmp[i] = x[i] + 1.0;
  return tmp;
}

考虑到它们的工作性能和基准性能时是等效的。我知道 Rcpp 提供糖和向量化操作，但如果它只是将 R 的向量作为输入并返回向量作为输出，那么我使用哪一个会有什么不同吗？在与 R 交互时，使用 std::vector<double> 会导致任何可能的问题吗？

Answer 1

"If unsure, just time it."

只需将以下几行添加到您已有的文件中即可：

/*** R
library(microbenchmark)
x <- 1.0* 1:1e7   # make sure it is numeric
microbenchmark(foo(x), bar(x), times=100L)
*/

然后只需调用 sourceCpp("...yourfile...") 就会生成以下结果（加上 signed/unsigned 比较时的警告）：

R> library(microbenchmark)

R> x <- 1.0* 1:1e7   # make sure it is numeric

R> microbenchmark(foo(x), bar(x), times=100L)
Unit: milliseconds
   expr     min      lq    mean  median      uq      max neval cld
 foo(x) 31.6496 31.7396 32.3967 31.7806 31.9186  54.3499   100  a 
 bar(x) 50.9229 51.0602 53.5471 51.1811 51.5200 147.4450   100   b
R>

您的 bar() 解决方案需要复制以在 R 内存池中创建 R 对象。 foo() 没有。这对大向量很重要，你运行超过 很多次 。此处我们看到收盘价比率约为 1.8。

在实践中，您更喜欢一种编码风格而不是另一种编码风格可能无关紧要。

Answer 2

Are equivalent when considering their working and benchmarked performance.

我怀疑基准测试是否准确，因为从 SEXP 到 std::vector<double> 需要从一个数据结构到另一个数据结构的深度副本。（当我输入这个时，@DirkEddelbuettel 运行一个微基准。）
Rcpp 对象的标记（例如 const Rcpp::NumericVector& x）只是视觉糖。默认情况下，给定的对象是一个指针，因此很容易产生波纹修改效果（见下文）。因此，不存在与 const std::vector<double>& x 有效地 "locks" 和 "passes a references".

Can using std::vector<double> lead to any possible problems when interacting with R?

简而言之，没有。唯一的惩罚是对象之间的 t运行sference。

这个 t运行sference 的好处是修改分配给另一个 NumericVector 的 NumericVector 的值不会导致多米诺骨牌更新。本质上，每个 std::vector<T> 都是另一个的直接副本。因此，不会发生以下情况：

#include<Rcpp.h>

// [[Rcpp::export]]
void test_copy(){
    NumericVector A = NumericVector::create(1, 2, 3);
    NumericVector B = A;

    Rcout << "Before: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; 

    A[1] = 5; // 2 -> 5

    Rcout << "After: " << std::endl << "A: " << A << std::endl << "B: " << B << std::endl; 
}

给出：

test_copy()
# Before: 
# A: 1 2 3
# B: 1 2 3
# After: 
# A: 1 5 3
# B: 1 5 3

Is there any reason why I should prefer Rcpp::NumericVector over std::vector<double>?

有几个原因：

如前所述，使用 Rcpp::NumericVector 可避免 deep 来回 C++ std::vector<T> 复制。
您可以访问 sugar 函数。
能够 'mark up' Rcpp C++ 中的对象（例如通过 .attr() 添加属性）

我应该更喜欢 Rcpp::NumericVector 而不是 std::vector？

Should I prefer Rcpp::NumericVector over std::vector?

c++

rcpp