如何防止 Rcpp 评估 'call' 个对象

Question

我需要简单的包装器来序列化 Rcpp 代码中的任意 R 对象。下面是我的代码的简化版本：

// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(RObject x) {
  Rcpp::Function serialize = Rcpp::Environment::namespace_env("base")["serialize"];
  return serialize(x, R_NilValue);
}

这很好用，但是我发现对于 class call 的对象，调用在序列化之前得到评估。我怎样才能防止这种情况发生？我只想在 R.

中模仿 serialize()

# Works as intended
identical(serialize(iris, NULL), cpp_serialize(iris))

# Does not work: call is evaluated
call_object <- call("rnorm", 1000)
identical(serialize(call_object, NULL), cpp_serialize(call_object))

更新：我有一个解决方法（见下文），但我仍然对合适的解决方案非常感兴趣。

Rcpp::RawVector cpp_serialize(RObject x) {
  Rcpp::Environment env;
  env["MY_R_OBJECT"] = x;
  Rcpp::ExpressionVector expr("serialize(MY_R_OBJECT, NULL)");
  Rcpp::RawVector buf = Rcpp::Rcpp_eval(expr, env);
}

Answer 1

tl;dr：问题是如何从 C 序列化为原始向量？ RApiSerialization 包中的（编译的 C）函数 serializeToRaw() 提供 R 自己的序列化代码。正如下面的基准测试所示，它比上面建议的快三倍。

更长的答案：我不建议为此使用 Rcpp::Function()。我们 do 实际上提供了一个访问序列化的 R 的正确包：RApiSerialization。它并没有做太多，但它恰好导出了两个函数来序列化和反序列化，从 RAW 到 RcppRedis 包需要和使用的 RAW。

所以我们可以在这里做同样的事情。我刚刚调用 Rcpp.package.skeleton() 创建了一个包 'jeroen'，将 LinkingTo: 和 Imports: 添加到 DESCRIPTION，将 imports() 添加到 NAMESPACE，然后就可以了：

#include <Rcpp.h>
#include <RApiSerializeAPI.h>       // provides C API with serialization

// [[Rcpp::export]]
Rcpp::RawVector cpp_serialize(SEXP s) {
  Rcpp::RawVector x = serializeToRaw(s);    // from RApiSerialize
  return x;
}

它基本上是您上面的版本的简单版本。

我们可以像您一样称呼它：

testJeroen <- function() {
    ## Works as intended
    res <- identical(serialize(iris, NULL), cpp_serialize(iris))

    ## Didn't work above, works now
    call_object <- call("rnorm", 1000)
    res <- res && 
           identical(serialize(call_object, NULL), cpp_serialize(call_object))

    res
}

你瞧，它起作用了：

R> library(jeroen)
Loading required package: RApiSerialize
R> testJeroen()
[1] TRUE
R>

简而言之：如果你不想搞砸 R，就不要使用 Rcpp::Function() 个对象。

基准测试：使用简单的

library(jeroen)             # package containing both functions from here 
library(microbenchmark)
microbenchmark(cpp=cpp_serialize(iris),  # my suggestion
               env=env_serialize(iris))  # OP's suggestion, renamed

我们得到

edd@max:/tmp/jeroen$ Rscript tests/quick.R 
Loading required package: RApiSerialize
Unit: microseconds
 expr    min      lq    mean  median      uq     max neval cld
  cpp 17.471 22.1225 28.0987 24.4975 26.4795 420.001   100  a 
  env 85.028 91.0055 94.8772 92.9465 94.9635 236.710   100   b
edd@max:/tmp/jeroen$

表明 OP 的回答速度慢了将近三倍。

Answer 2

我认为您在 Rcpp::Function class 中发现了意外行为。 MRE：

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
RObject cpp_identity(RObject x) {
  Rcpp::Function identity("identity");
  return identity(x);
}

/*** R
quoted <- quote(print(1));
identity(quoted)
cpp_identity(quoted)
*/

给予

> quoted <- quote(print(1));

> identity(quoted)
print(1)

> cpp_identity(quoted)
[1] 1
[1] 1

发生这种情况是因为 Rcpp 在幕后有效地执行了此评估：

Rcpp_eval(Rf_lang2(Rf_install("identity"), x))

基本上是这样的

eval(call("identity", quoted))

但调用对象不是 'protected' 来自评估。

如何防止 Rcpp 评估 'call' 个对象

How to prevent Rcpp from evaluating 'call' objects

r

rcpp