ANTLR4：匹配 [2 TO 9] 时出现问题，而 [2 - 9] 工作正常

Question

对不起标题，这是我在尝试精确时能想到的最好的... 我正在尝试解析 Lucene-like（通过排序简化）...

我尝试匹配的序列可能看起来像（每一行都是单独传入的单独序列）：

age: [5 TO 9]        // fails: line 1:11 mismatched input '9' expecting WS
age: [5 TO *]        // fails: line 1:11 mismatched input '*' expecting WS
age: [* TO 9]        // success
age: [2345 TO 2110]  // fails: line 1:14 mismatched input '2110' expecting WS

height: [1.6 TO 2.0] // success
height: [* TO 2.0]   // success
height: [1.6 TO *]   // success

born: [2020-03-02 TO 2020-03-25]                                  // success
born: [2020-03-02T21:21:00 TO 2020-03-25T21:23:00]                // success
born: [2020-03-02T21:24:00+01:00 TO 2020-03-25T21:24:00+01:00]    // success
born: [* TO 2020-03-25T21:24:00+01:00]                            // success
born: [2020-03-02T21:24:00+01:00 TO *]                            // success

etc.

如果我用“-”替换"TO"，那么所有失败的都会突然通过。

age: [5 - 9]        // success
age: [5 - *]        // success
age: [* - 9]        // success
age: [2345 - 2110]  // success

然而，目前不希望支持该语法，我添加它以查看它是否有效果，令我惊讶的是它确实如此......所以在语法中我有 WS ( TO | MINUS ) WS现在，应该是 WS TO WS

语法问题区域为：

rangeClause    :
    fieldName = name
    WS? COLON WS?
    start = ( LSBR | LCBR ) WS?
    from = simple_value
    WS ( TO | MINUS ) WS
    to = simple_value WS?
    end =( RSBR | RCBR );

显然，当使用 TO 时，它在 Range 子句的开头遇到 "Integer" 时似乎遇到了麻烦，但我还没有弄清楚原因。

完整语法：-target=JavaScript（其他暂时没考）

grammar SimplifiedWithOrdering;

/* Inspired by: https://github.com/lrowe/lucenequery */

/*
 * Parser Rules
 */

query  : WS? clause = defaultClause  (WS order = orderingClause)? WS? EOF;

/*
 This implements all clauses grouped into batches of the same type.
 The order implements precedence (important).
*/

defaultClause : orClause (WS? orClause)*;
orClause      : andClause (orOperator andClause)*;
andClause     : notClause (andOperator notClause)*;
notClause     : basicClause (notOperator basicClause)*;
basicClause   :
  WS? LPA defaultClause WS? RPA
  | WS? atom
  ;

atom : value | field | rangeClause;

rangeClause    :
    fieldName = name
    WS? COLON WS?
    start = ( LSBR | LCBR ) WS?
    from = simple_value
    WS ( TO | MINUS ) WS
    to = simple_value WS?
    end =( RSBR | RCBR );

//Order
orderingClause    : WS? ORDER WS BY WS orderingField ( WS? COMMA WS? orderingField )* WS?;
orderingField     : WS? fieldName = name (WS direction = orderingDirection)?;
orderingDirection : (ASC | DESC);

field       : fieldName = name WS? fieldOperator = operator WS? fieldValue = value;
name        : TERM;

value       : TERM                                #VTerm
            | WILDCARD_TERM                       #VWildcard
            | NUMBER                              #VNumber
            | PHRASE                              #VPhrase
            | STAR                                #VMatchAll
            | DATE                                #VDate
            | DATE_TIME                           #VDateTime
            | DATE_OFFSET                         #VDateOffset
            ;

simple_value : TERM                                #STerm
             | STAR                                #SMatchAll
             | NUMBER                              #SNumber
             | DATE                                #SDate
             | DATE_TIME                           #SDateTime
             | DATE_OFFSET                         #SDateOffset
             ;

andOperator : WS? AND;
orOperator  : WS? OR;
notOperator : WS? (AND WS)? NOT;

operator : COLON  #Equals
         ;



/*
 * Lexer Rules
 */

LPA   : '(';
RPA   : ')';
LSBR  : '[';
RSBR  : ']';
LCBR  : '{';
RCBR  : '}';
STAR  : '*';
QMARK : '?';
COMMA : ',';
PLUS  : '+';
MINUS : '-';
DOT   : '.';
COLON : ':';

AND        : A N D      ;
OR         : O R        ;
NOT        : N O T      ;
ORDER      : O R D E R  ;
BY         : B Y        ;
ASC        : A S C      ;
DESC       : D E S C    ;
TO         : T O        ;

WS  : (' '|'\t'|'\r'|'\n'|'\u3000')+;

fragment INT        : [0-9];
fragment ESC        : '\' .;

NUMBER  : MINUS? INT+ ('.' INT+)?;

// Special Date Handling:
//updated > 2018-03-04T14:41:23+00:00
fragment TIMEOFFSET  : ( MINUS | PLUS ) INT INT ( ':' INT INT );
TIME        : INT INT ':' INT INT ( ':' INT INT )? TIMEOFFSET?;
DATE        : INT INT INT INT MINUS INT INT MINUS INT INT;
DATE_TIME   : DATE 'T' TIME;

// Special Timespan Handling:
fragment TIME_IDEN_CHAR : [a-zA-Z];
fragment NOW         : N O W;
fragment TODAY       : T O D A Y;
fragment SIMPLE_TIMESPAN       : (INT+ '.')? INT INT ':' INT INT ( ':' INT INT ('.' INT INT))?;
fragment COMPLEX_TIMESPAN_PART : INT+ WS? TIME_IDEN_CHAR+;
fragment COMPLEX_TIMESPAN      : (COMPLEX_TIMESPAN_PART WS?)+;
fragment TIME_SPAN             : SIMPLE_TIMESPAN | COMPLEX_TIMESPAN;
DATE_OFFSET           : (NOW | TODAY)? WS? (PLUS|MINUS)? WS? TIME_SPAN;

fragment TERM_CHAR  : (~( ' ' | '\t' | '\n' | '\r' | '\u3000' | '\'' | '"'
                        | '(' | ')'  | '['  | ']'  | '{'      | '}'
                        | '!' | ':'  | '~'  | '>'  | '='      | '<'
                        | '?' | '*'
                        | '\'| ',' )| ESC );

fragment WILDCARD_CHAR : (~( ' ' | '\t' | '\n' | '\r' | '\u3000' | '\'' | '"'
                           | '(' | ')'  | '['  | ']'  | '{'      | '}'
                           | '!' | ':'  | '~'  | '>'  | '='      | '<'
                           | '\'| ',' )| ESC );

TERM   : TERM_CHAR+ ;
WILDCARD_TERM  : WILDCARD_CHAR+;

PHRASE : '"' ( ESC | ~('"'|'\'))+ '"';

fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

Answer 1

您必须意识到词法分析器独立于解析器构造标记。所以，如果解析器是"trying"所以匹配某个token，词法分析器不会"facilitate"解析器在这。词法分析器根据以下规则简单地构造标记：

尝试匹配尽可能多的字符
如果 2 个或更多词法分析器规则匹配相同数量的字符，则让规则先定义 "win"

由于规则 1，输入 5 TO 被标记为 DATE_OFFSET 标记。因此，如果您尝试解析 age: [5 TO 9]，您的解析器将必须使用以下标记：

TERM             'age'
COLON            ':'
WS               ' '
LSBR             '['
DATE_OFFSET      '5 TO '
NUMBER           '9'
RSBR             ']'

它不能（因此出现错误消息）。

这解释了为什么 age: [* TO 9] 被很好地解析，因为随后创建了以下标记供解析器使用：

TERM             'age'
COLON            ':'
WS               ' '
LSBR             '['
STAR             '*'
WS               ' '
TO               'TO'
WS               ' '
NUMBER           '9'
RSBR             ']'

一个可能的解决方案是从词法分析器中删除 DATE_OFFSET 并尝试为其创建解析器规则。

ANTLR4：匹配 [2 TO 9] 时出现问题，而 [2 - 9] 工作正常

ANTLR4: Problems with matching [2 TO 9] while [2 - 9] works fine

javascript

antlr4

完整语法：-target=JavaScript（其他暂时没考）