在提升精神中用不同的字符串替换点燃

Replace lit with different string in boost spirit

我正在尝试使用 boost spirit 解析包含转义序列的带引号的字符串。我正在寻找一种将转义序列 \" 替换为相应字符(在本例中为 " )的方法。到目前为止我已经想到了这个。

c_string %= lit('"') >> *(lit("\\"")[push_back(_val, '"')] | (char_ - '"')) >> lit('"')

完成替换

lit("\\"")[push_back(_val, '"')]

然而,这在我看来相当笨拙且难以理解。有没有更好的方法来完成这个?

迭代:您可以将 "\\"" 替换为 '\' >> lit('"'),重新格式化一下:

c_string
    %= lit('"')
    >> *(
           '\' >> lit('"')[push_back(_val, '"')]
         | (char_ - '"')
    )
    >> lit('"')
    ;

现在,您可以取消一些 lit() 调用,因为在 Qi 域中调用原型表达式时它们是隐式的:

c_string
    %= '"'
    >> *(
           '\' >> lit('"')[push_back(_val, '"')]
         | (char_ - '"')
    )
    >> '"'
    ;

接下来,lit(ch)[push_back(_val, ch)] 只是一种笨拙的说法 char_(ch):

c_string = '"'
    >> *( '\' >> char_('"') | (char_ - '"') )
    >> '"';

Note now we don't have the kludge of %= either (see Boost Spirit: "Semantic actions are evil"?) and you can leave the phoenix.hpp include(s)

最后,您可以通过说 ~char_(xyz):

来优化 char_ - char_(xyz)
c_string = '"' >> *('\' >> char_('"') | ~char_('"')) >> '"';

现在,您实际上并不是在这里解析 C 风格的字符串。你没有处理转义,所以为什么不简化:

c_string = '"' >> *('\' >> char_|~char_('"')) >> '"';

Note that now you actually parse backslash escapes, which you would otherwise not (you would parse "\" into "\" instead of "\")

如果您想更精确,请考虑处理转义,例如Handling utf-8 in Boost.Spirit with utf-32 parser

现场演示

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    const qi::rule<std::string::const_iterator, std::string()> c_string
        = '"' >> *('\' >> qi::char_|~qi::char_('"')) >> '"';

    for (std::string const input: {
            R"("")"               , // ""
            R"("\"")"             , // "\\""
            R"("Hello \"world\"")", // "Hello \\"world\\""
        })
    {
        std::string output;
        if (parse(input.begin(), input.end(), c_string, output)) {
            std::cout << input << " -> " << output << "\n";
        } else {
            std::cout << "Failed: " << input << "\n";
        }
    }
}

版画

"" -> 
"\"" -> "
"Hello \"world\"" -> Hello "world"