ANTLR4 查找标记但 returns 截断的解析树

Question

我支持一个开源项目，我的基于 ANTLR 的解析是 return截断的 ParseTree。我相信我已经提供了重现该问题所需的内容。

给定一个使用 ANTLR 4.8-1 创建并配置如下的解析器：

    public static Expressions parse(String mappingExpression) throws ParseException, IOException {

        // Expressions can include references to properties within an
        // application interface ("state"),
        // properties within an event, and various operators and functions.
       InputStream targetStream = new ByteArrayInputStream(mappingExpression.getBytes());
        CharStream input = CharStreams.fromStream(targetStream,Charset.forName("UTF-8"));

        MappingExpressionLexer lexer = new MappingExpressionLexer(input);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        MappingExpressionParser parser = new MappingExpressionParser(tokens);
        ParseTree tree = null;

        BufferingErrorListener errorListener = new BufferingErrorListener();

        try {
            // remove the default error listeners which print to stderr
            parser.removeErrorListeners();
            lexer.removeErrorListeners();

            // replace with error listener that buffer errors and allow us to retrieve them
            // later
            parser.addErrorListener(errorListener);
            lexer.addErrorListener(errorListener);

            tree = parser.expr();

并且我提供以下语句进行解析：

results.( $y := "test"; $bta := function($x) {(     $count($x.billToAccounts) > 1       ? ($contains($join($x.billToAccounts, ','), "super") ? "Super" : "Standard")        : ($contains($x.billToAccounts[0], "super") ? "Super" : "Standard") )}; { "users": $filter($, function($v, $i, $a) { $v.status = "PROVISIONED" }) { "firstName": $.profile.firstName, "lastName": $.profile.lastName, "email": $.profile.login, "lastLogin": $.lastLogin, "id" : $.id, "userType": $bta($.profile) } } )

解析树 returned 仅包含 "result" 标记，即使所有标记都已解析（如 _input.tokens 数组中所示）并且似乎都显示通道 0。

我希望解析器继续构建 _localCtx，MappingExpressionParser 语句：

_alt = getInterpreter().adaptivePredict(_input,17,_ctx);

returns 2 所以不会进一步扩展 _localCtx，它只包含一个带有 "result".

的 TerminalNodeContext

我已经尝试重新排列各种规则，并怀疑它与相对于 expr 规则的 parens 规则位置有关，但我遗漏了一些东西。

是什么导致 adaptivePredict 这么快变成 return 2？

/**
 * (c) Copyright 2018, 2019 IBM Corporation
 * 1 New Orchard Road,
 * Armonk, New York, 10504-1722
 * United States
 * +1 914 499 1900
 * support: Nathaniel Mills wnm3@us.ibm.com
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *
 */

/* Antlr grammar defining the mapping expression language */

grammar MappingExpression;

/* The start rule; begin parsing here.
   operator precedence is implied by the ordering in this list */


// =======================
// = PARSER RULES
// =======================

expr:
   ID                                                     # id
 | '*' ('.' expr)?                                        # field_values
 | DESCEND ('.' expr)?                                    # descendant
 | DOLLAR (('.' expr) | (ARR_OPEN expr ARR_CLOSE))?       # context_ref
 | ROOT ('.' expr)?                                       # root_path
 | '(' (expr (';' (expr)?)*)? ')'                         # parens
 | ARR_OPEN exprOrSeqList? ARR_CLOSE                      # array_constructor
 | OBJ_OPEN fieldList? OBJ_CLOSE                          # object_constructor
 | expr ARR_OPEN ARR_CLOSE                                # to_array
 | expr '.' expr                                          # path
 | expr ARR_OPEN expr ARR_CLOSE                           # array
 | VAR_ID (emptyValues | exprValues)                      # function_call
 | FUNCTIONID varList '{' exprList? '}'                   # function_decl
 | VAR_ID ASSIGN (expr | (FUNCTIONID varList '{' exprList? '}'))                   # var_assign
 | (FUNCTIONID varList '{' exprList? '}') exprValues                               # function_exec
 | op=(TRUE|FALSE)                                        # boolean
 | op='-' expr                                            # unary_op
 | expr op=('*'|'/'|'%') expr                             # muldiv_op
 | expr op=('+'|'-') expr                                 # addsub_op
 | expr '&' expr                                          # concat_op
 | expr 'in' expr                                         # membership
 | expr 'and' expr                                        # logand
 | expr 'or' expr                                         # logor
 | expr op=('<'|'<='|'>'|'>='|'!='|'=') expr              # comp_op
 | expr '?' expr (':' expr)?                              # conditional
 | expr CHAIN expr                                        # fct_chain
 | VAR_ID                                                 # var_recall
 | NUMBER                                                 # number
 | STRING                                                 # string
 | 'null'                                                 # null
 ;

fieldList : STRING ':' expr (',' STRING ':' expr)*;
exprList : expr (',' expr)* ;
varList : '('  (VAR_ID (',' VAR_ID)*)* ')' ;
exprValues : '(' exprList ')' ((',' exprOrSeq)* ')')?;
emptyValues : '(' ')' ;
seq : expr '..' expr ;

exprOrSeq : seq | expr ;
exprOrSeqList : exprOrSeq (',' exprOrSeq)* ;

// =======================
// = LEXER RULES
// =======================

TRUE : 'true';
FALSE : 'false';


STRING
    : '\'' (ESC | ~['\])* '\''
    | '"'  (ESC | ~["\])* '"'
    ;

NULL : 'null';

ARR_OPEN  : '[';
ARR_CLOSE : ']';

OBJ_OPEN  : '{';
OBJ_CLOSE : '}';

DOLLAR : '$';
ROOT : '$$' ;
DESCEND : '**';

NUMBER
    :   INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3
    |   INT EXP             // 1e10 3e4
    |   INT                 // 3, 45
    ;

FUNCTIONID : 'function' ;

WS: [ \t\n]+ -> skip ;                // ignore whitespace
COMMENT:  '/*' .*? '*/' -> skip;      // allow comments

// Assign token names used in above grammar
CHAIN : '~>' ;
ASSIGN : ':=' ;
MUL : '*' ;
DIV : '/' ;
ADD : '+' ;
SUB : '-' ;
REM : '%' ;
EQ  : '=' ;
NOT_EQ  : '!=' ;
LT  : '<' ;
LE  : '<=' ;
GT  : '>' ;
GE  : '>=' ;
CONCAT : '&';

VAR_ID : '$' ID ;

ID
    : [a-zA-Z] [a-zA-Z0-9_]*
    | BACK_QUOTE ~[`]* BACK_QUOTE;


// =======================
// = LEXER FRAGMENTS
// =======================


fragment ESC :   '\' (["'\/bfnrt] | UNICODE) ;
fragment UNICODE : ([\u0080-\uFFFF] | 'u' HEX HEX HEX HEX) ;
fragment HEX : [0-9a-fA-F] ;

fragment INT :   '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP :   [Ee] [+\-]? INT ;    // \- since - means "range" inside [...]

fragment SINGLE_QUOTE : '\'';
fragment DOUBLE_QUOTE : '"';
fragment BACK_QUOTE : '`';

Answer 1

虽然标记是为整个示例输入创建的，但并非所有标记都由解析器处理。如果你运行这个：

String mappingExpression = "results.(\n" +
        "    $y := \"test\"; \n" +
        "    $bta := function($x) {\n" +
        "        (\n" +
        "            $count($x.billToAccounts) > 1 \n" +
        "              ? ($contains($join($x.billToAccounts, ','), \"super\") ? \"Super\" : \"Standard\")\n" +
        "              : ($contains($x.billToAccounts[0], \"super\") ? \"Super\" : \"Standard\") \n" +
        "        )\n" +
        "    };\n" +
        "    { \n" +
        "        \"users\": $filter($, function($v, $i, $a) { \n" +
        "            $v.status = \"PROVISIONED\" \n" +
        "        })\n" +
        "        { \n" +
        "            \"firstName\": $.profile.firstName, \n" +
        "            \"lastName\": $.profile.lastName, \n" +
        "            \"email\": $.profile.login, \n" +
        "            \"lastLogin\": $.lastLogin, \n" +
        "            \"id\" : $.id, \n" +
        "            \"userType\": $bta($.profile) \n" +
        "        }\n" +
        "    } \n" +
        ")";

InputStream targetStream = new ByteArrayInputStream(mappingExpression.getBytes());
MappingExpressionLexer lexer = new MappingExpressionLexer(CharStreams.fromStream(targetStream, StandardCharsets.UTF_8));
MappingExpressionParser parser = new MappingExpressionParser(new CommonTokenStream(lexer));
ParseTree tree = parser.expr();

System.out.println(tree.toStringTree(parser));

将打印以下内容：

(expr results)

这意味着 expr 成功解析了第一个备选方案 ID，然后停止。

要强制解析器使用所有标记，请引入以下规则：

expr_to_eof
 : expr EOF
 ;

并更改：

ParseTree tree = parser.expr();

进入：

ParseTree tree = parser.expr_to_eof();

当您运行我再次发布的代码片段（使用默认错误侦听器！）时，您将在控制台上看到一些错误消息（即解析器未成功处理输入）。

如果我尝试解析输入：

results.(
    $y := "test"; 
    $bta := function($x) {
        (
            $count($x.billToAccounts) > 1 
              ? ($contains($join($x.billToAccounts, ','), "super") ? "Super" : "Standard")
              : ($contains($x.billToAccounts[0], "super") ? "Super" : "Standard")
        )
    };
    { 
        "users": $filter($, function($v, $i, $a) { 
            $v.status = "PROVISIONED" 
        })
    } 
)

那么解析器就没有问题了。检查树：

{ 
    "users": $filter($, function($v, $i, $a) { 
        $v.status = "PROVISIONED" 
    })
}

我看到它被识别为OBJ_OPEN fieldList? OBJ_CLOSE，其中fieldList定义如下：

fieldList : STRING ':' expr (',' STRING ':' expr)*;

即用逗号分隔的键值列表。所以如果你给解析器这样：

{
    "users": $filter($, function($v, $i, $a) {
        $v.status = "PROVISIONED"
    })
    {
        "firstName": $.profile.firstName,
        "lastName": $.profile.lastName,
        "email": $.profile.login,
        "lastLogin": $.lastLogin,
        "id" : $.id,
        "userType": $bta($.profile)
    }
}

它无法正确解析它，因为：

{
    "firstName": $.profile.firstName,
    "lastName": $.profile.lastName,
    "email": $.profile.login,
    "lastLogin": $.lastLogin,
    "id" : $.id,
    "userType": $bta($.profile)
}

本身不是键值，两者之间没有逗号分隔。

这将正确解析它：

{
    "users": $filter($, function($v, $i, $a) {
        $v.status = "PROVISIONED"
    }),
    "some-key": {
        "firstName": $.profile.firstName,
        "lastName": $.profile.lastName,
        "email": $.profile.login,
        "lastLogin": $.lastLogin,
        "id" : $.id,
        "userType": $bta($.profile)
    }
}

或者 $filter($, function($v, $i, $a) { $v.status = "PROVISIONED" }) 允许紧跟在 { "firstName": ... } 之后，但我从你的语法中看不出这是有效的。

ANTLR4 查找标记但 returns 截断的解析树

ANTLR4 finding tokens but returns truncated parse tree

java

antlr4