如何避免在 Rcpp 函数中从 R 环境读取数据
How to avoid reading data from R environment within Rcpp function
即使 MyCppFunction(NumericVector x)
returns 想要的输出,我不确定 proper/efficient 方法来避免读取变量 myY
上的数据而不将其作为函数参数。
我不将数据作为参数传递的原因是我最终会将 C++ 函数作为 objective 函数传递以进行最小化,并且最小化例程仅接受一个参数的函数,即 myX
。 举个例子:在 R 中,我将按以下方式将 myY
传递给 optim(...)
:optim(par,fn=MyRFunction,y=myY)
.
欢迎任何有关如何从 C++ 函数中正确访问 myY
的建议,这里是我担心的一个非常错误的方法的最小示例:
Update :我修改了代码以更好地反映上下文以及答案中提出的内容。为了以防万一,我的问题的重点在于这一行:NumericVector y = env["myY"]; // How to avoid this?
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {
Environment env = Environment::global_env();
NumericVector y = env["myY"]; // How to avoid this?
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
double MyCppFunctionNoExport(NumericVector x) {
Environment env = Environment::global_env();
NumericVector y = env["myY"]; // How to avoid this?
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
// [[Rcpp::export]]
double MyCppFunction2(NumericVector x, NumericVector y) {
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
// [[Rcpp::export]]
double MyRoutine(NumericVector x, Function fn) {
for (int i = 0; i < x.size(); i++) fn(x);
return 0;
}
// [[Rcpp::export]]
double MyRoutineNoExport(NumericVector x) {
for (int i = 0; i < x.size(); i++) MyCppFunctionNoExport(x);
return 0;
}
/*** R
MyRFunction <- function(x, y=myY) {
res = 0
for(i in 1:length(x)) res = res + (x[i]*y[i])
return (res)
}
callMyCppFunction2 <- function(x) {
MyCppFunction2(x, myY)
}
set.seed(123456)
myY = rnorm(1e3)
myX = rnorm(1e3)
all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX))
require(rbenchmark)
benchmark(MyRoutine(myX, fn=MyCppFunction),
MyRoutine(myX, fn=MyRFunction),
MyRoutine(myX, fn=callMyCppFunction2),
MyRoutineNoExport(myX), order="relative")[, 1:4]
*/
输出:
$ Rscript -e 'Rcpp::sourceCpp("stack.cpp")'
> MyRFunction <- function(x, y = myY) {
+ res = 0
+ for (i in 1:length(x)) res = res + (x[i] * y[i])
+ return(res)
+ }
> callMyCppFunction2 <- function(x) {
+ MyCppFunction2(x, myY)
+ }
> set.seed(123456)
> myY = rnorm(1000)
> myX = rnorm(1000)
> all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX))
[1] TRUE
> require(rbenchmark)
Loading required package: rbenchmark
> benchmark(MyRoutine(myX, fn = MyCppFunction), MyRoutine(myX,
+ fn = MyRFunction), MyRoutine(myX, fn = callMyCppFunction2),
+ MyRoutineNoEx .... [TRUNCATED]
test replications elapsed relative
4 MyRoutineNoExport(myX) 100 1.692 1.000
1 MyRoutine(myX, fn = MyCppFunction) 100 3.047 1.801
3 MyRoutine(myX, fn = callMyCppFunction2) 100 3.454 2.041
2 MyRoutine(myX, fn = MyRFunction) 100 8.277 4.892
optim
确实允许传递额外的变量。这里我们最小化 f
而不是 x
并传入附加变量 a
.
f <- function(x, a) sum((x - a)^2)
optim(1:2, f, a = 1)
给予:
$par
[1] 1.0000030 0.9999351
$value
[1] 4.22133e-09
$counts
function gradient
63 NA
$convergence
[1] 0
$message
NULL
使用两个参数并将 C++ 函数包装在 R 函数中。
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double MyCppFunction(NumericVector x, NumericVector y) {
return (sum(x) + sum(y));
}
右方:
callMyCFunc <- function(x) {
MyCppFunction(x, myY)
}
另一种解决方案。在 C 中设置全局 space:
#include <Rcpp.h>
using namespace Rcpp;
static NumericVector yglobal;
// [[Rcpp::export]]
void set_Y(NumericVector y) {
yglobal = y;
}
// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * yglobal(i));
return res;
}
右方:
set.seed(123456)
myY = rnorm(1000)
set_Y(myY);
myX = rnorm(1000)
MyCppFunction(myX)
(注意:static
的目的是将变量的范围限制在您的特定脚本中)
即使 MyCppFunction(NumericVector x)
returns 想要的输出,我不确定 proper/efficient 方法来避免读取变量 myY
上的数据而不将其作为函数参数。
我不将数据作为参数传递的原因是我最终会将 C++ 函数作为 objective 函数传递以进行最小化,并且最小化例程仅接受一个参数的函数,即 myX
。 举个例子:在 R 中,我将按以下方式将 myY
传递给 optim(...)
:optim(par,fn=MyRFunction,y=myY)
.
欢迎任何有关如何从 C++ 函数中正确访问 myY
的建议,这里是我担心的一个非常错误的方法的最小示例:
Update :我修改了代码以更好地反映上下文以及答案中提出的内容。为了以防万一,我的问题的重点在于这一行:NumericVector y = env["myY"]; // How to avoid this?
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {
Environment env = Environment::global_env();
NumericVector y = env["myY"]; // How to avoid this?
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
double MyCppFunctionNoExport(NumericVector x) {
Environment env = Environment::global_env();
NumericVector y = env["myY"]; // How to avoid this?
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
// [[Rcpp::export]]
double MyCppFunction2(NumericVector x, NumericVector y) {
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * y(i));
return res;
}
// [[Rcpp::export]]
double MyRoutine(NumericVector x, Function fn) {
for (int i = 0; i < x.size(); i++) fn(x);
return 0;
}
// [[Rcpp::export]]
double MyRoutineNoExport(NumericVector x) {
for (int i = 0; i < x.size(); i++) MyCppFunctionNoExport(x);
return 0;
}
/*** R
MyRFunction <- function(x, y=myY) {
res = 0
for(i in 1:length(x)) res = res + (x[i]*y[i])
return (res)
}
callMyCppFunction2 <- function(x) {
MyCppFunction2(x, myY)
}
set.seed(123456)
myY = rnorm(1e3)
myX = rnorm(1e3)
all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX))
require(rbenchmark)
benchmark(MyRoutine(myX, fn=MyCppFunction),
MyRoutine(myX, fn=MyRFunction),
MyRoutine(myX, fn=callMyCppFunction2),
MyRoutineNoExport(myX), order="relative")[, 1:4]
*/
输出:
$ Rscript -e 'Rcpp::sourceCpp("stack.cpp")' > MyRFunction <- function(x, y = myY) { + res = 0 + for (i in 1:length(x)) res = res + (x[i] * y[i]) + return(res) + } > callMyCppFunction2 <- function(x) { + MyCppFunction2(x, myY) + } > set.seed(123456) > myY = rnorm(1000) > myX = rnorm(1000) > all.equal(MyCppFunction(myX), MyRFunction(myX), callMyCppFunction2(myX)) [1] TRUE > require(rbenchmark) Loading required package: rbenchmark > benchmark(MyRoutine(myX, fn = MyCppFunction), MyRoutine(myX, + fn = MyRFunction), MyRoutine(myX, fn = callMyCppFunction2), + MyRoutineNoEx .... [TRUNCATED] test replications elapsed relative 4 MyRoutineNoExport(myX) 100 1.692 1.000 1 MyRoutine(myX, fn = MyCppFunction) 100 3.047 1.801 3 MyRoutine(myX, fn = callMyCppFunction2) 100 3.454 2.041 2 MyRoutine(myX, fn = MyRFunction) 100 8.277 4.892
optim
确实允许传递额外的变量。这里我们最小化 f
而不是 x
并传入附加变量 a
.
f <- function(x, a) sum((x - a)^2)
optim(1:2, f, a = 1)
给予:
$par
[1] 1.0000030 0.9999351
$value
[1] 4.22133e-09
$counts
function gradient
63 NA
$convergence
[1] 0
$message
NULL
使用两个参数并将 C++ 函数包装在 R 函数中。
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double MyCppFunction(NumericVector x, NumericVector y) {
return (sum(x) + sum(y));
}
右方:
callMyCFunc <- function(x) {
MyCppFunction(x, myY)
}
另一种解决方案。在 C 中设置全局 space:
#include <Rcpp.h>
using namespace Rcpp;
static NumericVector yglobal;
// [[Rcpp::export]]
void set_Y(NumericVector y) {
yglobal = y;
}
// [[Rcpp::export]]
double MyCppFunction(NumericVector x) {
double res = 0;
for (int i = 0; i < x.size(); i++) res = res + (x(i) * yglobal(i));
return res;
}
右方:
set.seed(123456)
myY = rnorm(1000)
set_Y(myY);
myX = rnorm(1000)
MyCppFunction(myX)
(注意:static
的目的是将变量的范围限制在您的特定脚本中)