对 Rcpp 中的字符串连接感到困惑

Confused about concatenation of strings in Rcpp

我正在尝试遍历数据帧并连接在 Rcpp 中由 space 分隔的字块。

我尝试阅读有关 Stack Overflow 的一些答案,但我对 Rcpp 中字符串的连接方式感到非常困惑。 (例如

我知道在 C++ 中你可以只使用 + 运算符来添加字符串。

下面是我的 Rcpp 函数

cppFunction('
Rcpp::StringVector formTextBlocks(DataFrame frame) {
#include <string> 
using namespace Rcpp;
 NumericVector frame_x = as<NumericVector>(frame["x"]);

   LogicalVector space = as<LogicalVector>(frame["space"]);
   Rcpp::StringVector text=as<StringVector>(frame["text"]);
  if (text.size() == 0) {
    return text;
  }
  int dfSize = text.size();

  for(int i = 0;  i < dfSize; ++i) {
    if ( i !=dfSize  ) {
     if (space[i]==true) {

     text[i]=text[i] + text[i+1]  ;

    }
  }

  }
  return text;
}
')

错误在error: no match for 'operator+'

如何在循环内连接字符串?

由于 operator+ 是为 std::string 定义的,最简单的方法是将 text 列转换为 std::vector<std::string> 而不是 Rcpp::StringVector

Rcpp::cppFunction('
std::vector<std::string> formTextBlocks(DataFrame frame) {
  LogicalVector space = as<LogicalVector>(frame["space"]);
  std::vector<std::string> text=as<std::vector<std::string>>(frame["text"]);
  if (text.size() == 0) {
    return text;
  }
  int dfSize = text.size();

  for(int i = 0;  i < dfSize - 1; ++i) {
    if (space[i]==true) {
      text[i]=text[i] + text[i+1];
    }
  }
  return text;
}
')

set.seed(20191129)
textBlock <- data.frame(space = sample(c(TRUE, FALSE), 100, replace = TRUE),
                        text = sample(LETTERS, 100, replace = TRUE),
                        stringsAsFactors = FALSE)
formTextBlocks(textBlock)
#>   [1] "B"  "N"  "G"  "BM" "M"  "O"  "C"  "F"  "OQ" "Q"  "FH" "H"  "D"  "HK" "KH"
#>  [16] "H"  "S"  "LX" "XO" "OY" "Y"  "E"  "VD" "D"  "TN" "N"  "LL" "LQ" "Q"  "F" 
#>  [31] "XX" "X"  "S"  "R"  "P"  "L"  "M"  "GK" "KD" "DD" "D"  "H"  "M"  "M"  "K" 
#>  [46] "N"  "GP" "PG" "G"  "P"  "G"  "O"  "N"  "NY" "Y"  "OX" "X"  "LX" "XF" "FS"
#>  [61] "SE" "E"  "PS" "S"  "YD" "D"  "F"  "Z"  "H"  "ZN" "N"  "OM" "M"  "XH" "HV"
#>  [76] "V"  "OX" "X"  "J"  "BZ" "Z"  "FZ" "ZE" "E"  "SV" "V"  "G"  "F"  "DZ" "ZF"
#>  [91] "F"  "PB" "B"  "K"  "N"  "U"  "B"  "PV" "V"  "C"

reprex package (v0.3.0)

于 2019-11-29 创建

备注:

  • 我删除了 #includeusing。这些不是必需的,也不属于函数定义。
  • 我已经删除了 i != dfSize 测试,它永远不会 false
  • 循环的长度减一,因为您正在接触元素 i+1