Boost.Spirit 解析可选前缀
Boost.Spirit Parsing Optional Prefix
我正在尝试解析一串以空格分隔、可选标记的关键字。例如
descr:expense type:receivable customer 27.3
其中冒号前的表达式是标签,它是可选的(即假定默认标签)。
我不能完全让解析器做我想做的事。我对 canonical example 做了一些小改动,其目的是解析 key/value 对(很像 HTTP 查询字符串)。
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct field_value_sequence_default_field
: qi::grammar<Iterator, pairs_type()>
{
field_value_sequence_default_field()
: field_value_sequence_default_field::base_type(query)
{
query = pair >> *(qi::lit(' ') >> pair);
pair = -(field >> ':') >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\.");
}
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
但是,当我解析它时,当标记被遗漏时,optional<string>
不是 empty/false。相反,它有一个值的副本。该对的第二部分也具有值。
如果未加标签的关键字不能成为标签(语法规则,例如有小数点),那么事情会像我预期的那样工作。
我做错了什么?这是 PEG 的概念错误吗?
Rather, it's got a copy of the value. The second part of the pair has the value as well.
这是容器属性和回溯的常见陷阱:使用 qi::hold
,例如Understanding Boost.spirit's string parser
pair = -qi::hold[field >> ':'] >> value;
完整样本Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct Grammar : qi::grammar<Iterator, pairs_type()>
{
Grammar() : Grammar::base_type(query) {
query = pair % ' ';
pair = -qi::hold[field >> ':'] >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\.");
}
private:
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
int main()
{
using It = std::string::const_iterator;
for (std::string const input : {
"descr:expense type:receivable customer 27.3",
"expense type:receivable customer 27.3",
"descr:expense receivable customer 27.3",
"expense receivable customer 27.3",
}) {
It f = input.begin(), l = input.end();
std::cout << "==== '" << input << "' =============\n";
pairs_type data;
if (qi::parse(f, l, Grammar<It>(), data)) {
std::cout << "Parsed: \n";
for (auto& p : data) {
std::cout << p.first << "\t->'" << p.second << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
打印
==== 'descr:expense type:receivable customer 27.3' =============
Parsed:
descr ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense type:receivable customer 27.3' =============
Parsed:
-- ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'descr:expense receivable customer 27.3' =============
Parsed:
descr ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense receivable customer 27.3' =============
Parsed:
-- ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
我正在尝试解析一串以空格分隔、可选标记的关键字。例如
descr:expense type:receivable customer 27.3
其中冒号前的表达式是标签,它是可选的(即假定默认标签)。
我不能完全让解析器做我想做的事。我对 canonical example 做了一些小改动,其目的是解析 key/value 对(很像 HTTP 查询字符串)。
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct field_value_sequence_default_field
: qi::grammar<Iterator, pairs_type()>
{
field_value_sequence_default_field()
: field_value_sequence_default_field::base_type(query)
{
query = pair >> *(qi::lit(' ') >> pair);
pair = -(field >> ':') >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\.");
}
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
但是,当我解析它时,当标记被遗漏时,optional<string>
不是 empty/false。相反,它有一个值的副本。该对的第二部分也具有值。
如果未加标签的关键字不能成为标签(语法规则,例如有小数点),那么事情会像我预期的那样工作。
我做错了什么?这是 PEG 的概念错误吗?
Rather, it's got a copy of the value. The second part of the pair has the value as well.
这是容器属性和回溯的常见陷阱:使用 qi::hold
,例如Understanding Boost.spirit's string parser
pair = -qi::hold[field >> ':'] >> value;
完整样本Live On Coliru
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <boost/optional/optional_io.hpp>
#include <iostream>
namespace qi = boost::spirit::qi;
typedef std::pair<boost::optional<std::string>, std::string> pair_type;
typedef std::vector<pair_type> pairs_type;
template <typename Iterator>
struct Grammar : qi::grammar<Iterator, pairs_type()>
{
Grammar() : Grammar::base_type(query) {
query = pair % ' ';
pair = -qi::hold[field >> ':'] >> value;
field = +qi::char_("a-zA-Z0-9");
value = +qi::char_("a-zA-Z0-9+-\.");
}
private:
qi::rule<Iterator, pairs_type()> query;
qi::rule<Iterator, pair_type()> pair;
qi::rule<Iterator, std::string()> field, value;
};
int main()
{
using It = std::string::const_iterator;
for (std::string const input : {
"descr:expense type:receivable customer 27.3",
"expense type:receivable customer 27.3",
"descr:expense receivable customer 27.3",
"expense receivable customer 27.3",
}) {
It f = input.begin(), l = input.end();
std::cout << "==== '" << input << "' =============\n";
pairs_type data;
if (qi::parse(f, l, Grammar<It>(), data)) {
std::cout << "Parsed: \n";
for (auto& p : data) {
std::cout << p.first << "\t->'" << p.second << "'\n";
}
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
打印
==== 'descr:expense type:receivable customer 27.3' =============
Parsed:
descr ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense type:receivable customer 27.3' =============
Parsed:
-- ->'expense'
type ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'descr:expense receivable customer 27.3' =============
Parsed:
descr ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'
==== 'expense receivable customer 27.3' =============
Parsed:
-- ->'expense'
-- ->'receivable'
-- ->'customer'
-- ->'27.3'