如何避免在 Rcpp 函数中从 R 环境读取数据

How to avoid reading data from R environment within Rcpp function

即使 MyCppFunction(NumericVector x) returns 想要的输出,我不确定 proper/efficient 方法来避免读取变量 myY 上的数据而不将其作为函数参数。

我不将数据作为参数传递的原因是我最终会将 C++ 函数作为 objective 函数传递以进行最小化,并且最小化例程仅接受一个参数的函数,即 myX举个例子:在 R 中,我将按以下方式将 myY 传递给 optim(...)optim(par,fn=MyRFunction,y=myY).

欢迎任何有关如何从 C++ 函数中正确访问 myY 的建议,这里是我担心的一个非常错误的方法的最小示例:

Update :我修改了代码以更好地反映上下文以及答案中提出的内容。为了以防万一,我的问题的重点在于这一行:NumericVector y = env["myY"]; // How to avoid this?

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {

  Environment env = Environment::global_env();
  NumericVector y = env["myY"];  // How to avoid this?

  double res = 0;

  for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));

  return res;
}

double MyCppFunctionNoExport(NumericVector x) {

  Environment env = Environment::global_env();
  NumericVector y = env["myY"];  // How to avoid this?

  double res = 0;

  for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));

  return res;
}

// [[Rcpp::export]]
double MyCppFunction2(NumericVector x, NumericVector y) {
  double res = 0;

  for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));

  return res;
}

// [[Rcpp::export]]
double MyRoutine(NumericVector x, Function fn) {

  for (int i = 0; i < x.size(); i++) fn(x);

  return 0;
}

// [[Rcpp::export]]
double MyRoutineNoExport(NumericVector x) {

  for (int i = 0; i < x.size(); i++) MyCppFunctionNoExport(x);

  return 0;
}

/*** R
MyRFunction <- function(x, y=myY) {
  res = 0
  for(i in 1:length(x)) res = res + (x[i]*y[i])
  return (res)
}

callMyCppFunction2 <- function(x) {
   MyCppFunction2(x, myY)
}

set.seed(123456)

myY = rnorm(1e3)
myX = rnorm(1e3)

all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX))

require(rbenchmark)

benchmark(MyRoutine(myX, fn=MyCppFunction),
          MyRoutine(myX, fn=MyRFunction),
          MyRoutine(myX, fn=callMyCppFunction2),
          MyRoutineNoExport(myX), order="relative")[, 1:4]

*/

输出:

$ Rscript -e 'Rcpp::sourceCpp("stack.cpp")'
> MyRFunction <- function(x, y = myY) {
+     res = 0
+     for (i in 1:length(x)) res = res + (x[i] * y[i])
+     return(res)
+ }

> callMyCppFunction2 <- function(x) {
+     MyCppFunction2(x, myY)
+ }

> set.seed(123456)

> myY = rnorm(1000)

> myX = rnorm(1000)

> all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX))
[1] TRUE

> require(rbenchmark)
Loading required package: rbenchmark

> benchmark(MyRoutine(myX, fn = MyCppFunction), MyRoutine(myX, 
+     fn = MyRFunction), MyRoutine(myX, fn = callMyCppFunction2), 
+     MyRoutineNoEx .... [TRUNCATED] 
                                     test replications elapsed relative
4                  MyRoutineNoExport(myX)          100   1.692    1.000
1      MyRoutine(myX, fn = MyCppFunction)          100   3.047    1.801
3 MyRoutine(myX, fn = callMyCppFunction2)          100   3.454    2.041
2        MyRoutine(myX, fn = MyRFunction)          100   8.277    4.892

optim 确实允许传递额外的变量。这里我们最小化 f 而不是 x 并传入附加变量 a.

f <- function(x, a) sum((x - a)^2)
optim(1:2, f, a = 1)

给予:

$par
[1] 1.0000030 0.9999351

$value
[1] 4.22133e-09

$counts
function gradient 
      63       NA 

$convergence
[1] 0

$message
NULL

使用两个参数并将 C++ 函数包装在 R 函数中。

#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double MyCppFunction(NumericVector x, NumericVector y) {
  return (sum(x) + sum(y));
}

右方:

callMyCFunc <- function(x) {
   MyCppFunction(x, myY)
}

另一种解决方案。在 C 中设置全局 space:

#include <Rcpp.h>
using namespace Rcpp;

static NumericVector yglobal;

// [[Rcpp::export]]
void set_Y(NumericVector y) {
  yglobal = y;
}

// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {
  double res = 0;
  for (int i = 0; i < x.size(); i++) res = res + (x(i) * yglobal(i));
  return res;
}

右方:

set.seed(123456)

myY = rnorm(1000)
set_Y(myY);
myX = rnorm(1000)

MyCppFunction(myX)

(注意:static 的目的是将变量的范围限制在您的特定脚本中)