ANTLR4 中的递归组

Question

我是 ANTLR4 的新手，我很难理解它的语法。假设您使用以下规则构建数据：

消息是组的多行集合
一个组由一个段和可能的修饰符组成
段是三个字符的字母数字
a [ 修饰符表示一个可选组，（该组）由 ]
a { 修饰符表示一个重复组，（该组）由 }

这种数据格式的一个例子是：

MSH
MSA
[{ ERR }]
[{ NTE }]
[
    [
        PID
        [{NTE}]
    ]
    {
        ORC
        [
             {
                  [TQ1]
                  [{ TQ2 }]
             }
        ]
    //shortened for brevity
    }
]

所以读作：

一个必需的单一 MSH 段
一个必需的单一 MSA 段
可选组由可选PID组和可选重复NTE和ORC重复组组成可选重复可选TQ1和可选重复TQ2
等等...

到目前为止我有：

message : group+ NEWLINE ;

group : ID+
      | (ID | '{'group'}'
      | (ID | '['group']'
      ;

OPTSTART : '[' ;
OPTEND : ']' ;
REPSTART : '{' ;
REPEND : '}' ;
ID : [a-zA-Z0-9]*
WS : [ \t\r\n]+ -> skip ;

我现在被卡住了。我已经设法让解析树解析 MSH 和 MSA，但我不确定我是否在正确的轨道上。如果有任何指点或提示，我将不胜感激。

Answer 1

给出的 DSL 描述：

A message is a multiline collection of groups

A group consist of a segment and possibly modifiers

a segment is a three char alphanumeric

a [ modifier indicates an optional group which (the group) is bounded by ]

a { modifier indicates a repetition group which (the group) is bounded by }

可以直接翻译成ANTLR文法：

// each group self-terminates, so no NL terminal required
// use EOF terminal to ensure that entire source is parsed
message : group+ EOF ;

// list each possible representation of a group
group   : LBRACK group RBRACK
        | LBRACE group RBRACE
        | SEGMENT 
        ;

// literal implementation of a segment
SEGMENT : Char Char Char ;

// define literals only once
LBRACK  : '[' ;
RBRACK  : ']' ;
LBRACE  : '{' ;
RBRACE  : '}' ;

// all whitespace is inconsequential
WS      : [ \t\r\n]  -> skip ;

fragment Char : [a-zA-Z0-9] ;

ANTLR4 中的递归组

Recursive groups in ANTLR4

parsing

antlr4