非参数化和参数化语句的不同标记名称或如何使用 RuleLexer 跳转到上一个标记
Different tokens names for non-parameterized and parameterized statements OR how to jump to previous token with RuleLexer
如何为以下示例实现不同的令牌名称:
#someNameAttribute //where #someNameAttribute should be assigned to IDENTIFIER lexer rule
#someNameAttribute("2a3a796e-9870-4b88-9f2d-383eb9566613", 10) // where #someNameAttribute should be assigned to PARAMETERIZED_IDENTIFIER since we faced with parenthesis
我现在的语法(但它总是分配给 IDENTIFIER):
grammar Rule;
ruleExpression
: identifierExpression EOF | parameterizedIdentifierExpression EOF
;
identifierExpression
: IDENTIFIER
;
parameterizedIdentifierExpression
: PIDENTIFIER LPAREN UUID DELIMETER NUMERIC RPAREN
;
DELIMETER : ',';
LPAREN : '(';
RPAREN : ')';
UUID : '"'[0-9a-fA-F]+'-'[0-9a-fA-F]+'-'[1-5][0-9a-fA-F]+'-'[89abAB][0-9a-fA-F]+'-'[0-9a-fA-F]+'"';
NUMERIC : [0-9]+ ( '.' [0-9]+ )? ;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
// PARAMETERIZED_IDENTIFIER : { behind(LPAREN) }? IDENTIFIER; // Tried to use semantic predicate but no luck. Might be used it wrong way
WS : [ \r\t\u000C\n]+ -> skip;
或者,如果有可能以某种方式检查 Java 代码中 #someNameAttribute 之后括号中的下一个标记 - 将很高兴听到如何做到这一点。我也尝试过这种方式,但是 RuleLexer.nextToken() 允许我检查下一个标记,但我无法再次跳转到上一个标记以继续整个语句(因此开始丢失一些标记)。
如何使用 Java 代码中的 RuleLexer 来预测要分配的令牌名称或如何跳转到上一个令牌?
尝试这样的事情(仅适用于 Java):
grammar Rule;
any : .*? EOF;
LPAREN : '(';
RPAREN : ')';
UUID : '"'[0-9a-fA-F]+'-'[0-9a-fA-F]+'-'[1-5][0-9a-fA-F]+'-'[89abAB][0-9a-fA-F]+'-'[0-9a-fA-F]+'"';
NUMERIC : [0-9]+ ( '.' [0-9]+ )? ;
PIDENTIFIER : IDENTIFIER {_input.LA(1) == '('}?;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
WS : [ \r\t\u000C\n]+ -> skip;
OTHER : . ;
如果标识符和 (
之间允许有空格,请执行以下操作:
grammar Rule;
@lexer::members {
boolean spacesAndOpenParenAhead() {
for (int i = 1; ; i++) {
char ch = (char)_input.LA(i);
if (ch == '(') {
return true;
}
else if (ch != ' ' && ch != '\t' && ch != '\r' && ch != '\n') {
return false;
}
}
}
}
...
PIDENTIFIER : IDENTIFIER {spacesAndOpenParenAhead()}?;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
当我 运行 下面的代码在我的两个示例语法中时:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "#someNameAttribute\n" +
"#someNameAttribute(\"2a3a796e-9870-4b88-9f2d-383eb9566613\", 10)";
RuleLexer lexer = new RuleLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
RuleLexer.VOCABULARY.getDisplayName(t.getType()),
t.getText().replace("\n", "\n"));
}
}
}
以下内容打印在我的控制台上:
IDENTIFIER `#someNameAttribute`
PIDENTIFIER `#someNameAttribute`
'(' `(`
UUID `"2a3a796e-9870-4b88-9f2d-383eb9566613"`
OTHER `,`
NUMERIC `10`
')' `)`
如何为以下示例实现不同的令牌名称:
#someNameAttribute //where #someNameAttribute should be assigned to IDENTIFIER lexer rule
#someNameAttribute("2a3a796e-9870-4b88-9f2d-383eb9566613", 10) // where #someNameAttribute should be assigned to PARAMETERIZED_IDENTIFIER since we faced with parenthesis
我现在的语法(但它总是分配给 IDENTIFIER):
grammar Rule;
ruleExpression
: identifierExpression EOF | parameterizedIdentifierExpression EOF
;
identifierExpression
: IDENTIFIER
;
parameterizedIdentifierExpression
: PIDENTIFIER LPAREN UUID DELIMETER NUMERIC RPAREN
;
DELIMETER : ',';
LPAREN : '(';
RPAREN : ')';
UUID : '"'[0-9a-fA-F]+'-'[0-9a-fA-F]+'-'[1-5][0-9a-fA-F]+'-'[89abAB][0-9a-fA-F]+'-'[0-9a-fA-F]+'"';
NUMERIC : [0-9]+ ( '.' [0-9]+ )? ;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
// PARAMETERIZED_IDENTIFIER : { behind(LPAREN) }? IDENTIFIER; // Tried to use semantic predicate but no luck. Might be used it wrong way
WS : [ \r\t\u000C\n]+ -> skip;
或者,如果有可能以某种方式检查 Java 代码中 #someNameAttribute 之后括号中的下一个标记 - 将很高兴听到如何做到这一点。我也尝试过这种方式,但是 RuleLexer.nextToken() 允许我检查下一个标记,但我无法再次跳转到上一个标记以继续整个语句(因此开始丢失一些标记)。
如何使用 Java 代码中的 RuleLexer 来预测要分配的令牌名称或如何跳转到上一个令牌?
尝试这样的事情(仅适用于 Java):
grammar Rule;
any : .*? EOF;
LPAREN : '(';
RPAREN : ')';
UUID : '"'[0-9a-fA-F]+'-'[0-9a-fA-F]+'-'[1-5][0-9a-fA-F]+'-'[89abAB][0-9a-fA-F]+'-'[0-9a-fA-F]+'"';
NUMERIC : [0-9]+ ( '.' [0-9]+ )? ;
PIDENTIFIER : IDENTIFIER {_input.LA(1) == '('}?;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
WS : [ \r\t\u000C\n]+ -> skip;
OTHER : . ;
如果标识符和 (
之间允许有空格,请执行以下操作:
grammar Rule;
@lexer::members {
boolean spacesAndOpenParenAhead() {
for (int i = 1; ; i++) {
char ch = (char)_input.LA(i);
if (ch == '(') {
return true;
}
else if (ch != ' ' && ch != '\t' && ch != '\r' && ch != '\n') {
return false;
}
}
}
}
...
PIDENTIFIER : IDENTIFIER {spacesAndOpenParenAhead()}?;
IDENTIFIER : '#' [a-zA-Z$_] [a-zA-Z$_0-9]*;
当我 运行 下面的代码在我的两个示例语法中时:
import org.antlr.v4.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "#someNameAttribute\n" +
"#someNameAttribute(\"2a3a796e-9870-4b88-9f2d-383eb9566613\", 10)";
RuleLexer lexer = new RuleLexer(CharStreams.fromString(source));
CommonTokenStream stream = new CommonTokenStream(lexer);
stream.fill();
for (Token t : stream.getTokens()) {
System.out.printf("%-20s `%s`%n",
RuleLexer.VOCABULARY.getDisplayName(t.getType()),
t.getText().replace("\n", "\n"));
}
}
}
以下内容打印在我的控制台上:
IDENTIFIER `#someNameAttribute`
PIDENTIFIER `#someNameAttribute`
'(' `(`
UUID `"2a3a796e-9870-4b88-9f2d-383eb9566613"`
OTHER `,`
NUMERIC `10`
')' `)`