将 StringVector 与 Rcpp 连接起来

Concatenate StringVector with Rcpp

我不知道如何用 Rcpp 连接 2 个字符串;虽然我怀疑有一个明显的答案,但文档对我没有帮助。

http://gallery.rcpp.org/articles/working-with-Rcpp-StringVector/

http://gallery.rcpp.org/articles/strings_with_rcpp/

StringVector concatenate(StringVector a, StringVector b)
{
 StringVector c;
 c= ??;
 return c;
}

我希望得到这样的输出:

a=c("a","b"); b=c("c","d");
concatenate(a,b)
[1] "ac" "bd"

可能有几种不同的方法来解决这个问题,但这里有一个选项 std::transform

#include <Rcpp.h>
using namespace Rcpp;

struct Functor {
    std::string
    operator()(const std::string& lhs, const internal::string_proxy<STRSXP>& rhs) const
    {
        return lhs + rhs;
    }
};

// [[Rcpp::export]]
CharacterVector paste2(CharacterVector lhs, CharacterVector rhs)
{
    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(
        res.begin(), res.end(),
        rhs.begin(), res.begin(),
        Functor()
    );
    return wrap(res);
}

/*** R

lhs <- letters[1:2]; rhs <- letters[3:4]

paste(lhs, rhs, sep = "")
# [1] "ac" "bd"

paste2(lhs, rhs)
# [1] "ac" "bd"

*/ 

首先将左侧表达式复制到 std::vector<std::string> 的原因是 internal::string_proxy<> class provides operator+ 带有签名

std::string operator+(const std::string& x, const internal::string_proxy<STRSXP>& y) 

而不是,例如

operator+(const internal::string_proxy<STRSXP>& x, const internal::string_proxy<STRSXP>& y) 

如果你的编译器支持 C++11,这可以做得更干净一些:

// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector paste3(CharacterVector lhs, CharacterVector rhs)
{
    using proxy_t = internal::string_proxy<STRSXP>;

    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(res.begin(), res.end(), rhs.begin(), res.begin(),
        [&](const std::string& x, const proxy_t& y) {
            return x + y;
        }
    );

    return wrap(res);
}

/*** R

lhs <- letters[1:2]; rhs <- letters[3:4]

paste(lhs, rhs, sep = "")
# [1] "ac" "bd"

paste3(lhs, rhs)
# [1] "ac" "bd"

*/

我将保留此答案,但请注意@nrussell 提供的关于使用 push_back()!

的警告

我自己还在努力掌握 Rcpp,所以我在一个循环中寻找了一个字符串构建器

library(Rcpp)

cppFunction('StringVector concatenate(StringVector a, StringVector b)
{
  StringVector c;
  std::ostringstream x;
  std::ostringstream y;

 // concatenate inputs
  for (int i = 0; i < a.size(); i++)
    x << a[i];

  for (int i = 0; i < b.size(); i++)
    y << b[i];

  c.push_back(x.str());
  c.push_back(y.str());

  return c;

}')

a=c("a","b"); b=c("c","d");
concatenate(a,b)
# [1] "ab" "cd" 

比较 (i) 重复调用 push_back 与 (ii) 预分配和填充策略的性能,我们可以看到后者更可取:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector pbpaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res;

    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res.push_back(oss.str());
    }

    return res;
}

// [[Rcpp::export]]
CharacterVector sspaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res(sz);

    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res[i] = oss.str();
    }

    return res;
}

/*** R

lhs <- as.character(1:5000); rhs <- as.character(5001:10000)

all.equal(pbpaste(lhs, rhs), sspaste(lhs, rhs))
# [1] TRUE

microbenchmark::microbenchmark(
    "push_back" = pbpaste(lhs, rhs),
    "preallocate" = sspaste(lhs, rhs),
    times = 200L
)
# Unit: milliseconds
#         expr        min         lq       mean     median         uq        max neval cld
#    push_back 101.521579 105.334649 115.156544 107.275678 110.957420 256.722239   200   b
#  preallocate   1.364213   1.585818   1.789564   1.778153   1.934758   2.955352   200   a

*/

一个可行的解决方案是使用:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector concatenate(std::string x, std::string y)
{
               return wrap(x + y);
}

然后:

Vconcatenate=Vectorize(concatenate)
Vconcatenate(letters[1:2],letters[3:4])

或者:

// [[Rcpp::export]]
CharacterVector concatenate(std::vector<std::string> x,std::vector<std::string> y)
{
  std::vector<std::string> res(x.size());
  for (int i=0; i < x.size(); i++)
  {
    res[i]=x[i]+y[i];
  }
  return wrap(res);
}