如何克服 Boost Spirit AST 混乱

Question

对于初学者来说，我有一个 AST，我必须在其中执行 forward declaration，但显然这在最新的 C++ 编译器中不完全正确？

我相信，克服这个问题我可以完成其余的语法。作为参考，我正在或多或少地忠实于 Google Protobuf v2 specification.

编写解析器

如果没记错的话，这可能与引入类型定义有关？ And/or Boost Spirit 递归下降，即recursive_wrapper？但是已经有一段时间了，我对这些细节有点模糊。有人介意看一看吗？

但是对于前向声明问题，我认为发布的代码大部分是语法完整的。 TBD 是 Protobuf service、rpc、stream，当然还有评论。

那里可能潜伏着一些变种小魔怪，我也不知道该怎么办；即如何合成 "nil" 或 empty_statement，例如，在整个语法选择中弹出几次。

Answer 1

怎么会有这么一大堆未经测试的代码？我认为从头开始查看此代码的最小化版本并在它停止工作的最早点停止是有意义的，而不是推迟完整性检查直到它变得无法管理。¹

我将向您指出一些您可以看到该做什么的地方。

Recursive using declaration with boost variant

我必须警告我认为 Qi 还不支持 std::variant 或 std::optional。我可能是错的。

审查和修正轮次

我花了太多时间试图解决许多问题，无论是微妙的还是不那么微妙的。

我很乐意解释一下，但现在我只是放弃结果：

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#include <iostream>
#include <string>
#include <vector>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi_auto.hpp>
//#include <boost/container/vector.hpp>

namespace AST {
    using boost::variant;
    using boost::optional;

    enum class bool_t { false_, true_ };
    enum class syntax_t { proto2 };

    using str_t = std::string;

    struct full_id_t {
        std::string full_id;
    };

    using int_t = intmax_t;
    using float_t = double;

    /// See: http://www.boost.org/doc/libs/1_68_0/libs/spirit/example/qi/compiler_tutorial/calc8/ast.hpp
    /// Specifically, struct nil {}.
    struct empty_statement_t {};

    // TODO: TBD: we may need/want to dissect this one still further... i.e. to ident, message/enum-name, etc.
    struct element_type_t : std::string {
        using std::string::string;
        using std::string::operator=;
    };

    // TODO: TBD: let's not get too fancy with the inheritance, ...
    // TODO: TBD: however, scanning the other types, we could potentially do more of it, strategically, here and there
    struct msg_type_t : element_type_t {};
    struct enum_type_t : element_type_t {};

    struct package_t {
        std::string full_id;
    };

    using const_t = variant<full_id_t, int_t, float_t, str_t, bool_t>;

    struct import_modifier_t {
        std::string val;
    };

    struct import_t {
        optional<import_modifier_t> mod;
        std::string target_name;
    };

    struct option_t {
        std::string name;
        const_t val;
    };

    using label_t = std::string;

    using type_t = variant<std::string, msg_type_t, enum_type_t>;

    // TODO: TBD: could potentially get more meta-dissected based on the specification:
    struct field_opt_t {
        std::string name;
        const_t val;
    };

    struct field_t {
        label_t label; // this would benefit from being an enum instead
        type_t type;
        std::string name;
        int_t number;
        std::vector<field_opt_t> opts;
    };

    // TODO: TBD: add extend_t after msg_t ...
    struct field_t;
    struct enum_t;
    struct msg_t;
    struct extend_t;
    struct extensions_t;
    struct group_t;
    struct option_t;
    struct oneof_t;
    struct map_field_t;
    struct reserved_t;

    using msg_body_t = std::vector<variant<
        field_t,
        enum_t,
        msg_t,
        extend_t,
        extensions_t,
        group_t,
        option_t,
        oneof_t,
        map_field_t,
        reserved_t,
        empty_statement_t
    >>;

    struct group_t {
        label_t label;
        std::string name;
        int_t number;
        msg_body_t body;
    };

    struct oneof_field_t {
        type_t type;
        std::string name;
        int_t number;
        optional<std::vector<field_opt_t>> opts;
    };

    struct oneof_t {
        std::string name;
        std::vector<variant<oneof_field_t, empty_statement_t>> choices;
    };

    struct key_type_t {
        std::string val;
    };

    struct map_field_t {
        key_type_t key_type;
        type_t type;
        std::string name;
        int_t number;
        optional<std::vector<field_opt_t>> opts;
    };

    struct range_t {
        int_t min;
        optional<int_t> max;
    };

    struct extensions_t {
        std::vector<range_t> ranges;
    };

    struct reserved_t {
        variant<std::vector<range_t>, std::vector<std::string>> val;
    };

    struct enum_val_opt_t {
        std::string name;
        const_t val;
    };

    struct enum_field_t {
        std::string name;
        std::string ordinal;
        std::vector<enum_val_opt_t> opt; // consistency
    };

    using enum_body_t = std::vector<variant<option_t, enum_field_t, empty_statement_t> >;

    struct enum_t {
        std::string name;
        enum_body_t body;
    };

    struct msg_t {
        std::string name;
        // TODO: TBD: here is another case where forward declaration is necessary in terms of the AST definition.
        msg_body_t body;
    };

    struct extend_t {
        using content_t = variant<field_t, group_t, empty_statement_t>;

        // TODO: TBD: actually, this use case may beg the question whether
        // "message type", et al, in some way deserve a first class definition?
        msg_type_t msg_type;
        std::vector<content_t> content;
    };

    struct top_level_def_t {
        // TODO: TBD: may add svc_t after extend_t ...
        variant<msg_t, enum_t, extend_t> content;
    };

    struct proto_t {
        syntax_t syntax;
        std::vector<variant<import_t, package_t, option_t, top_level_def_t, empty_statement_t>> content;
    };

    template <typename T>
    static inline std::ostream& operator<<(std::ostream& os, T const&) {
        std::operator<<(os, "[");
        std::operator<<(os, typeid(T).name());
        std::operator<<(os, "]");
        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(AST::option_t, name, val)
BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, full_id)
BOOST_FUSION_ADAPT_STRUCT(AST::package_t, full_id)
BOOST_FUSION_ADAPT_STRUCT(AST::import_modifier_t, val)
BOOST_FUSION_ADAPT_STRUCT(AST::import_t, mod, target_name)
BOOST_FUSION_ADAPT_STRUCT(AST::field_opt_t, name, val)
BOOST_FUSION_ADAPT_STRUCT(AST::field_t, label, type, name, number, opts)
BOOST_FUSION_ADAPT_STRUCT(AST::group_t, label, name, number, body)
BOOST_FUSION_ADAPT_STRUCT(AST::oneof_field_t, type, name, number, opts)
BOOST_FUSION_ADAPT_STRUCT(AST::oneof_t, name, choices)
BOOST_FUSION_ADAPT_STRUCT(AST::key_type_t, val)
BOOST_FUSION_ADAPT_STRUCT(AST::map_field_t, key_type, type, name, number, opts)
BOOST_FUSION_ADAPT_STRUCT(AST::range_t, min, max)
BOOST_FUSION_ADAPT_STRUCT(AST::extensions_t, ranges)
BOOST_FUSION_ADAPT_STRUCT(AST::reserved_t, val)
BOOST_FUSION_ADAPT_STRUCT(AST::enum_val_opt_t, name, val)
BOOST_FUSION_ADAPT_STRUCT(AST::enum_field_t, name, ordinal, opt)
BOOST_FUSION_ADAPT_STRUCT(AST::enum_t, name, body)

BOOST_FUSION_ADAPT_STRUCT(AST::msg_t, name, body)
BOOST_FUSION_ADAPT_STRUCT(AST::extend_t, msg_type, content)
BOOST_FUSION_ADAPT_STRUCT(AST::top_level_def_t, content)
BOOST_FUSION_ADAPT_STRUCT(AST::proto_t, syntax, content)

namespace qi = boost::spirit::qi;

template<typename It>
struct ProtoGrammar : qi::grammar<It, AST::proto_t()> {

    using char_rule_type   = qi::rule<It, char()>;
    using string_rule_type = qi::rule<It, std::string()>;
    using skipper_type     = qi::space_type;

    ProtoGrammar() : ProtoGrammar::base_type(start) {

        using qi::lit;
        using qi::digit;
        using qi::lexeme; // redundant, because no rule declares a skipper
        using qi::char_;

        // Identifiers
        id = lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
        full_id      = id;
        msg_name     = id;
        enum_name    = id;
        field_name   = id;
        oneof_name   = id;
        map_name     = id;
        service_name = id;
        rpc_name     = id;
        stream_name  = id;

        // These distincions aren't very useful until in the semantic analysis
        // stage. I'd suggest to not conflate that with parsing.
        msg_type  = qi::as_string[ -char_('.') >> *(qi::hold[id >> char_('.')]) >> msg_name ];
        enum_type = qi::as_string[ -char_('.') >> *(qi::hold[id >> char_('.')]) >> enum_name ];

        // group_name = lexeme[qi::upper >> *char_("A-Za-z0-9_")];
        // simpler:
        group_name = &qi::upper >> id;

        // Integer literals
        oct_lit = &char_('0')       >> qi::uint_parser<AST::int_t, 8>{};
        hex_lit = qi::no_case["0x"] >> qi::uint_parser<AST::int_t, 16>{};
        dec_lit =                      qi::uint_parser<AST::int_t, 10>{};
        int_lit = lexeme[hex_lit | oct_lit | dec_lit]; // ordering is important

        // Floating-point literals
        float_lit = qi::real_parser<double, qi::strict_real_policies<double> >{};

        // String literals
        oct_esc  = '\' >> qi::uint_parser<unsigned char, 8, 3, 3>{};
        hex_esc  = qi::no_case["\x"] >> qi::uint_parser<unsigned char, 16, 2, 2>{};
        // The last bit in this phrase is literally, "Or Any Characters Not in the Sequence" (fixed)
        char_val = hex_esc | oct_esc | char_esc | ~char_("[=10=]\n\");
        str_lit  = lexeme["'" >> *(char_val - "'") >> "'"]
            | lexeme['"' >> *(char_val - '"') >> '"']
            ;

        // Empty Statement - likely redundant
        empty_statement = ';' >> qi::attr(AST::empty_statement_t{});

        // Constant
        const_
            = bool_lit
            | str_lit
            | float_lit // again, ordering is important
            | int_lit
            | full_id
            ;

        // keyword helper
        #define KW(p) (lexeme[(p) >> !(qi::alnum | '_')])
        // Syntax
        syntax = KW("syntax") >> '=' >> lexeme[ lit("'proto2'") | "\"proto2\"" ] >> ';' >> qi::attr(AST::syntax_t::proto2);

        // Import Statement
        import_modifier = KW("weak") | KW("public");
        import = KW("import") >> -import_modifier >> str_lit >> ';';

        // Package
        package = KW("package") >> full_id >> ';';

        // Option
        opt_name = qi::raw[ (id | '(' >> full_id >> ')') >> *('.' >> id) ];

        opt = KW("option") >> opt_name >> '=' >> const_ >> ';';

        // Fields
        field_num = int_lit;
        label = KW("required")
            | KW("optional")
            | KW("repeated")
            ;

        type 
            = KW(builtin_type)
            | msg_type
            | enum_type
            ;

        // Normal field
        field_opt  = opt_name >> '=' >> const_;
        field_opts = -('[' >> field_opt % ',' >> ']');
        field      = label >> type >> field_name >> '=' >> field_num >> field_opts >> ';';

        // Group field
        group      = label >> KW("group") >> group_name >> '=' >> field_num >> msg_body;

        // Oneof and oneof field
        oneof_field = type >> field_name >> '=' >> field_num >> field_opts >> ';';
        oneof       = KW("oneof") >> oneof_name >> '{'
            >> *(
                    oneof_field
                    // TODO: TBD: ditto how to handle "empty" not synthesizing any attributes ...
                    | empty_statement
                ) >> '}';

        // Map field
        key_type = KW(builtin_type);

        // mapField = "map" "<" keyType "," type ">" mapName "=" fieldNumber [ "[" fieldOptions "]" ] ";"
        map_field = KW("map") >> '<' >> key_type >> ',' >> type >> '>' >> map_name
            >> '=' >> field_num >> field_opts >> ';';

        // Extensions and Reserved, Extensions ...
        range      = int_lit >> -(KW("to") >> (int_lit | KW("max")));
        ranges     = range % ',';
        extensions = KW("extensions") >> ranges >> ';';

        // Reserved
        reserved    = KW("reserved") >> (ranges | field_names) >> ';';
        field_names = field_name % ',';

        // Enum definition
        enum_val_opt  = opt_name >> '=' >> const_;
        enum_val_opts = -('[' >> (enum_val_opt % ',') >> ']');
        enum_field    = id >> '=' >> int_lit >> enum_val_opts >> ';';
        enum_body     = '{' >> *(opt | enum_field | empty_statement) >> '}';
        enum_         = KW("enum") >> enum_name >> enum_body;

        // Message definition
        msg = KW("message") >> msg_name >> msg_body;
        msg_body = '{' >> *(
                field
                | enum_
                | msg
                | extend
                | extensions
                | group
                | opt
                | oneof
                | map_field
                | reserved
                //// TODO: TBD: how to "include" an empty statement ... ? "empty" does not synthesize anything, right?
                | empty_statement
                ) >> '}';

        // Extend
        extend_content = field | group | empty_statement;
        extend_contents = '{' >> *extend_content >> '}';
        extend = KW("extend") >> msg_type >> extend_contents;

        top_level_def = msg | enum_ | extend /*| service*/;
        proto = syntax >> *(import | package | opt | top_level_def | empty_statement);
        start = qi::skip(qi::space) [ proto ];

        BOOST_SPIRIT_DEBUG_NODES(
            (id) (full_id) (msg_name) (enum_name) (field_name) (oneof_name)
            (map_name) (service_name) (rpc_name) (stream_name) (group_name)
            (msg_type) (enum_type)
            (oct_lit) (hex_lit) (dec_lit) (int_lit)
            (float_lit)
            (oct_esc) (hex_esc) (char_val) (str_lit)
            (empty_statement)
            (const_)
            (syntax)
            (import_modifier) (import)
            (package)
            (opt_name) (opt)
            (field_num)
            (label)
            (type)
            (field_opt) (field_opts) (field)
            (group)
            (oneof_field)
            (oneof)
            (key_type) (map_field)
            (range) (ranges) (extensions) (reserved)
            (field_names)
            (enum_val_opt) (enum_val_opts) (enum_field) (enum_body) (enum_)
            (msg) (msg_body)
            (extend_content) (extend_contents) (extend)
            (top_level_def) (proto))
    }

  private:
    struct escapes_t : qi::symbols<char, char> {
        escapes_t() { this->add
                ("\a",  '\a')
                ("\b",  '\b')
                ("\f",  '\f')
                ("\n",  '\n')
                ("\r",  '\r')
                ("\t",  '\t')
                ("\v",  '\v')
                ("\\", '\')
                ("\'",  '\'')
                ("\\"", '"');
        }
    } char_esc;

    string_rule_type id, full_id, msg_name, enum_name, field_name, oneof_name,
                     map_name, service_name, rpc_name, stream_name, group_name;

    qi::rule<It, AST::msg_type_t(), skipper_type> msg_type;
    qi::rule<It, AST::enum_type_t(), skipper_type> enum_type;

    qi::rule<It, AST::int_t()> int_lit, dec_lit, oct_lit, hex_lit;
    qi::rule<It, AST::float_t()> float_lit;

    /// true | false
    struct bool_lit_t : qi::symbols<char, AST::bool_t> {
        bool_lit_t() { this->add
            ("true", AST::bool_t::true_)
            ("false", AST::bool_t::false_);
        }
    } bool_lit;

    char_rule_type oct_esc, hex_esc, char_val;
    qi::rule<It, AST::str_t()> str_lit;

    // TODO: TBD: there are moments when this is a case in a variant or vector<variant>
    qi::rule<It, AST::empty_statement_t(), skipper_type> empty_statement;

    qi::rule<It, AST::const_t(), skipper_type> const_;

    /// syntax = {'proto2' | "proto2"} ;
    qi::rule<It, AST::syntax_t(), skipper_type> syntax;

    /// import [weak|public] <targetName/> ;
    qi::rule<It, AST::import_t(), skipper_type> import;
    qi::rule<It, AST::import_modifier_t(), skipper_type> import_modifier;

    /// package <fullIdent/> ;
    qi::rule<It, AST::package_t(), skipper_type> package;

    /// option <optionName/> = <const/> ;
    qi::rule<It, AST::option_t(), skipper_type> opt;
    /// <ident/> | "(" <fullIdent/> ")" ("." <ident/>)*
    string_rule_type opt_name;

    qi::rule<It, AST::label_t(), skipper_type> label;
    qi::rule<It, AST::type_t(), skipper_type> type;

    struct builtin_type_t : qi::symbols<char, std::string> {
        builtin_type_t() { this->add
            ("double", "double")
            ("float", "float")
            ("int32", "int32")
            ("int64", "int64")
            ("uint32", "uint32")
            ("uint64", "uint64")
            ("sint32", "sint32")
            ("sint64", "sint64")
            ("fixed32", "fixed32")
            ("fixed64", "fixed64")
            ("sfixed32", "sfixed32")
            ("sfixed64", "sfixed64")
            ("bool", "bool")
            ("string", "string")
            ("bytes", "bytes");
        }
    } builtin_type;
    qi::rule<It, AST::int_t()> field_num;

    qi::rule<It, AST::field_opt_t(), skipper_type> field_opt;
    qi::rule<It, std::vector<AST::field_opt_t>(), skipper_type> field_opts;
    qi::rule<It, AST::field_t(), skipper_type> field;

    qi::rule<It, AST::group_t(), skipper_type> group;

    qi::rule<It, AST::oneof_t(), skipper_type> oneof;
    qi::rule<It, AST::oneof_field_t(), skipper_type> oneof_field;

    qi::rule<It, AST::key_type_t(), skipper_type> key_type;
    qi::rule<It, AST::map_field_t(), skipper_type> map_field;

    /// <int/> [ to ( <int/> | "max" ) ]
    qi::rule<It, AST::range_t(), skipper_type> range;
    qi::rule<It, std::vector<AST::range_t>(), skipper_type> ranges;
    /// extensions <ranges/> ;
    qi::rule<It, AST::extensions_t(), skipper_type> extensions;

    /// reserved <ranges/>|<fieldNames/> ;
    qi::rule<It, AST::reserved_t(), skipper_type> reserved;
    qi::rule<It, std::vector<std::string>(), skipper_type> field_names;

    /// <optionName/> = <constant/>
    qi::rule<It, AST::enum_val_opt_t(), skipper_type> enum_val_opt;
    qi::rule<It, std::vector<AST::enum_val_opt_t>(), skipper_type> enum_val_opts;
    /// <ident/> = <int/> [ +<enumValueOption/> ] ;
    qi::rule<It, AST::enum_field_t(), skipper_type> enum_field;
    qi::rule<It, AST::enum_body_t(), skipper_type> enum_body;
    qi::rule<It, AST::enum_t(), skipper_type> enum_;

    // TODO: TBD: continue here: https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#message_definition
    /// message <messageName/> <messageBody/>
    qi::rule<It, AST::msg_t(), skipper_type> msg;
    /// *{ <field/> | <enum/> | <message/> | <extend/> | <extensions/> | <group/>
    ///    | <option/> | <oneof/> | <mapField/> | <reserved/> | <emptyStatement/> }
    qi::rule<It, AST::msg_body_t(), skipper_type> msg_body;

    // TODO: TBD: not sure how appropriate it would be to reach these cases, but we'll see what happens...
    /// extend <messageType/> *{ <field/> | <group/> | <emptyStatement/> }
    qi::rule<It, AST::extend_t::content_t(), skipper_type> extend_content;
    qi::rule<It, std::vector<AST::extend_t::content_t>(), skipper_type> extend_contents;
    qi::rule<It, AST::extend_t(), skipper_type> extend;

    // TODO: TBD: ditto comments in the rule definition section.
    // service; rpc; stream;

    /// topLevelDef = <message/> | <enum/> | <extend/> | <service/>
    qi::rule<It, AST::top_level_def_t(), skipper_type> top_level_def;
    /// <syntax/> { <import/> | <package/> | <option/> | <option/> | <emptyStatement/> }
    qi::rule<It, AST::proto_t(), skipper_type> proto;
    qi::rule<It, AST::proto_t()> start;
};

#include <fstream>
int main() {
    std::ifstream ifs("sample.proto");
    std::string const input(std::istreambuf_iterator<char>(ifs), {});

    using It = std::string::const_iterator;
    It f = input.begin(), l = input.end();

    ProtoGrammar<It> const g;
    AST::proto_t parsed;
    bool ok = qi::parse(f, l, g, parsed);

    if (ok) {
        std::cout << "Parse succeeded\n";
    } else {
        std::cout << "Parse failed\n";
    }

    if (f != l) {
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
    }
}

其中样本输入为

syntax = "proto2";
import "demo_stuff.proto";

package Whosebug;

message Sample {
    optional StuffMsg foo_list = 1;
    optional StuffMsg bar_list = 2;
    optional StuffMsg qux_list = 3;
}

message TransportResult {
    message Sentinel {}
    oneof Chunk {
        Sample payload         = 1;
        Sentinel end_of_stream = 2;
    }
}

message ShowTime {
    optional uint32 magic = 1 [ default = 0xBDF69E88 ];
    repeated string parameters = 2;
    optional string version_info = 3;
}

版画

<proto>
  <try>syntax = "proto2";\ni</try>
  <syntax>
    <try>syntax = "proto2";\ni</try>
    <success>\nimport "demo_stuff.</success>
    <attributes>[[N3AST8syntax_tE]]</attributes>
  </syntax>
  <import>
    <try>\nimport "demo_stuff.</try>
    <import_modifier>
      <try> "demo_stuff.proto";</try>
      <fail/>
    </import_modifier>
    <str_lit>
      <try>"demo_stuff.proto";\n</try>
    [ ...
           much 
                 snipped
                          ... ]
  <empty_statement>
    <try>\n\n</try>
    <fail/>
  </empty_statement>
  <success>\n\n</success>
  <attributes>[[[N3AST8syntax_tE], [[[empty], [d, e, m, o, _, s, t, u, f, f, ., p, r, o, t, o]], [[S, t, a, c, k, O, v, e, r, f, l, o, w]], [[[S, a, m, p, l, e], [[[], [S, t, u, f, f, M, s, g], [f, o, o, _, l, i, s, t], 1, []], [[], [S, t, u, f, f, M, s, g], [b, a, r, _, l, i, s, t], 2, []], [[], [S, t, u, f, f, M, s, g], [q, u, x, _, l, i, s, t], 3, []]]]], [[[T, r, a, n, s, p, o, r, t, R, e, s, u, l, t], [[[S, e, n, t, i, n, e, l], []], [[C, h, u, n, k], [[[S, a, m, p, l, e], [p, a, y, l, o, a, d], 1, []], [[S, e, n, t, i, n, e, l], [e, n, d, _, o, f, _, s, t, r, e, a, m], 2, []]]]]]], [[[S, h, o, w, T, i, m, e], [[[], [u, i, n, t, 3, 2], [m, a, g, i, c], 1, [[[d, e, f, a, u, l, t], 3187056264]]], [[], [s, t, r, i, n, g], [p, a, r, a, m, e, t, e, r, s], 2, []], [[], [s, t, r, i, n, g], [v, e, r, s, i, o, n, _, i, n, f, o], 3, []]]]]]]]</attributes>
</proto>
Parse succeeded
Remaining unparsed input: '

'

¹（将 "recursive descent"（一个解析概念）与递归变体混为一谈也令人困惑）。

² 遗憾的是它超出了 Wandbox 和 Coliru 的容量

Answer 2

我只是总结一下 a couple of key points 我观察到消化。首先，哇，其中一些我认为没有记录在灵气页面等中，除非您碰巧通过小鸟等听说过它。也就是说，非常感谢您的见解！

有趣的是，t运行尽可能将事物直接转化为语言级别。例如，bool_t，直接派生自 std::string，甚至 syntax_t，仅举几例。没想到甚至可以从 parser/AST 的角度做到这一点，但这是有道理的。

非常有趣，源自 std::string。如上，不知道。

struct element_type_t : std::string {
    using std::string::string;
    using std::string::operator=;
};

特别强调 string 和 operator=，我假设有助于解析器规则、属性传播等

是的，我想知道对 std::optional 和 std::variant 的支持，但考虑到 Boost.Spirit 的成熟度，这是有意义的。优点：利用相同的 boost 结构代替 std.

不知道您可以定义别名。这比定义第一个 class 结构更有意义。例如，

using const_t = variant<full_id_t, int_t, float_t, str_t, bool_t>;

有趣的 label_t 别名。尽管我可能会追求它是具有相应规则属性的语言级别枚举。尽管如此，up-vote 在这里付出了很多努力。

using label_t = std::string;

然后是问题区域的前向声明和别名，msg_body_t。有趣我不知道。真的。

struct field_t;
struct enum_t;
struct msg_t;
struct extend_t;
struct extensions_t;
struct group_t;
struct option_t;
struct oneof_t;
struct map_field_t;
struct reserved_t;

using msg_body_t = std::vector<variant<
    field_t,
    enum_t,
    msg_t,
    extend_t,
    extensions_t,
    group_t,
    option_t,
    oneof_t,
    map_field_t,
    reserved_t,
    empty_statement_t
>>;

不过，我不确定这如何避免 C++ C2079 (VS2017) 前向声明问题？我将不得不仔细检查我的项目代码，但它运行显然对你来说，所以关于它的东西一定比我想象的更合理。

BOOST_FUSION_ADAPT_STRUCT(AST::option_t, name, val)
// etc ...

我想这会显着简化结构适应。

最终，是的，我希望有一名船长参与其中。当我偶然发现前向声明问题时，我还没有走那么远。

using skipper_type = qi::space_type;
// ...
start = qi::skip(qi::space) [ proto ];
// ...
qi::rule<It, AST::msg_type_t(), skipper_type> msg_type;

对于许多规则定义，= 或 %=？我多年来听到的普遍智慧是更喜欢 %=。你的意见？即

id = lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
// ^, or:
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")];
// ^^ ?

对于这些以语言友好的 AST 属性着陆是有意义的：

oct_lit = &char_('0')       >> qi::uint_parser<AST::int_t, 8>{};
hex_lit = qi::no_case["0x"] >> qi::uint_parser<AST::int_t, 16>{};
dec_lit =                      qi::uint_parser<AST::int_t, 10>{};
int_lit = lexeme[hex_lit | oct_lit | dec_lit]; // ordering is important
// Yes, I understand why, because 0x... | 0... | dig -> that to say, great point!

我没有花太多时间，也许，在这里探索 Qi 暴露的位，即 qi::upper，等等，但这是一个很好的观点：

group_name = &qi::upper >> id;

不知道这个运算符适合 char_。然而，我不认为它被记录在案，除非你碰巧从小鸟那里听说过：

//         Again, great points re: numerical/parser ordering.
char_val = hex_esc | oct_esc | char_esc | ~char_("[=19=]\n\");
//                                        ^

不确定你的意思，"likely redundant"。但是，非常有趣的是，您可以在这里制作归属。我很喜欢。

// Empty Statement - likely redundant
empty_statement = ';' >> qi::attr(AST::empty_statement_t{});
//                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

如果你的意思是分号是否多余，一开始我也是这么认为的。然后我研究了其余的语法，而不是第二次猜测，不，我同意语法： "empty statement" 确实是一个空语句，至少当你接受它在语法选择的上下文中。然而，有时分号确实表示你我所认为的：即 "end of statement" 或 eos，这让我皱起了眉毛一开始也是。

我也不知道在气化规则中可以直接归因于事物。事实上，我一直在考虑的一般指导是避免语义操作。但我想这些是不同于 qi::attr(...) 本身的动物。

这里的方法不错。此外，它还有助于规则定义的一致性。 up-vote 不能满足这个，等等。

#define KW(p) (lexeme[(p) >> !(qi::alnum | '_')])

这里考虑的是语言级别的枚举值，不过还是挺有意思的

label = KW("required")
    | KW("optional")
    | KW("repeated")
    ;

总而言之，这里涉及的规则较少。它在所有字符串等方面有点混乱，但我喜欢它或多或少地与语法一对一读取以告知定义。

    // mapField = "map" "<" keyType "," type ">" mapName "=" fieldNumber [ "[" fieldOptions "]" ] ";"
    map_field = KW("map") >> '<' >> key_type >> ',' >> type >> '>' >> map_name
        >> '=' >> field_num >> field_opts >> ';';

我想知道 Qi 符号是否有用，但我不知道这些位会这么有用：

struct escapes_t : qi::symbols<char, char> {
    escapes_t() { this->add
            ("\a",  '\a')
            ("\b",  '\b')
            ("\f",  '\f')
            ("\n",  '\n')
            ("\r",  '\r')
            ("\t",  '\t')
            ("\v",  '\v')
            ("\\", '\')
            ("\'",  '\'')
            ("\\"", '"');
    }
} char_esc;

同上符号，up-vote：

struct builtin_type_t : qi::symbols<char, std::string> { /* ... */ };

总而言之，这里给我留下了深刻的印象。非常感谢您的见解。

Answer 3

我认为在范围内有轻微的疏忽。参考 proto2 Extensions specification，字面上我们有：

range =  intLit [ "to" ( intLit | "max" ) ]

然后在AST中调整：

enum range_max_t { max };

struct range_t {
    int_t min;
    boost::optional<boost::variant<int_t, range_max_t>> max;
};

最后但同样重要的是语法：

range %= int_lit >> -(KW("to") >> (int_lit | KW_ATTR("max", ast::range_max_t::max)));

有帮手：

#define KW_ATTR(p, a) (qi::lexeme[(p) >> !(qi::alnum | '_')] >> qi::attr(a))

未经测试，但我今天比昨天更有信心认为这种方法是正确的。

最坏的情况下，如果 int_t 基本上定义为 long long 和枚举类型 range_max_t 之间存在任何类型冲突，那么我可以只存储关键字 "max"同样的效果。

这是最坏的情况；我想让它尽可能简单，但同时又不想忘记规范。

无论如何，再次感谢您的见解！ up-vote

Answer 4

我不确定我是否完全理解这方面，除了一个构建而另一个不构建。

使用 extend_t 你引入了一个 using 类型别名 content_t。我"get it"，从某种意义上说，这很神奇"just works"。例如：

struct extend_t {
    using content_t = boost::variant<field_t, group_t, empty_statement_t>;
    msg_type_t msg_type;
    std::vector<content_t> content;
};

但是，与更传统的模板继承和类型定义相比，我不确定为什么它不起作用。例如：

template<typename Content>
struct has_content {
    typedef Content content_type;
    content_type content;
};

// It is noteworthy, would need to identify the std::vector::value_type as well...
struct extend_t : has_content<std::vector<boost::variant<field_t, group_t, empty_statement_t>>> {
    msg_type_t msg_type;
};

在这种情况下，我开始以 不完整类型 错误的形式看到 前向声明 的症状。

我很犹豫是否接受原样的 "gospel"，但没有更好地理解为什么会这样。

如何克服 Boost Spirit AST 混乱

How to overcome a Boost Spirit AST snafu

c++

boost

abstract-syntax-tree

protocol-buffers

审查和修正轮次