为什么逗号“，”在antlr lexer中的[.]类型表达式中被计算在内

Question

我正在为 bash 脚本编写语法。我在标记“，”符号时遇到问题。以下语法将其标记为 <BLOB> 而我希望将其标记为 <OTHER>.

grammar newgram;

code                : KEY (BLOB)+   (EOF | '\n')+;

KEY                 : 'wget';

BLOB                : [a-zA-Z0-9@!$^%*&+-.]+?;

OTHER               : .;

但是，如果我将 BLOB 设为 [a-zA-Z0-9@!$^%*&+.-]+?;，那么它会被标记为 <OTHER>。

我不明白为什么会这样。

在前一种情况下，字符 : 和 / 也被标记为 <OTHER>，所以我看不出要标记 , 的原因<BLOB>.

我正在标记的输入，wget -o --quiet https,://www.google.com 我收到的带有上述语法的输出，

[@0,0:3='wget',<'wget'>,1:0]
[@1,4:4=' ',<OTHER>,1:4]
[@2,5:5='-',<BLOB>,1:5]
[@3,6:6='o',<BLOB>,1:6]
[@4,7:7=' ',<OTHER>,1:7]
[@5,8:8='-',<BLOB>,1:8]
[@6,9:9='-',<BLOB>,1:9]
[@7,10:10='q',<BLOB>,1:10]
[@8,11:11='u',<BLOB>,1:11]
[@9,12:12='i',<BLOB>,1:12]
[@10,13:13='e',<BLOB>,1:13]
[@11,14:14='t',<BLOB>,1:14]
[@12,15:15=' ',<OTHER>,1:15]
[@13,16:16='h',<BLOB>,1:16]
[@14,17:17='t',<BLOB>,1:17]
[@15,18:18='t',<BLOB>,1:18]
[@16,19:19='p',<BLOB>,1:19]
[@17,20:20='s',<BLOB>,1:20]
[@18,21:21=',',<BLOB>,1:21]
[@19,22:22=':',<OTHER>,1:22]
[@20,23:23='/',<OTHER>,1:23]
[@21,24:24='/',<OTHER>,1:24]
[@22,25:25='w',<BLOB>,1:25]
[@23,26:26='w',<BLOB>,1:26]
[@24,27:27='w',<BLOB>,1:27]
[@25,28:28='.',<BLOB>,1:28]
[@26,29:29='g',<BLOB>,1:29]
[@27,30:30='o',<BLOB>,1:30]
[@28,31:31='o',<BLOB>,1:31]
[@29,32:32='g',<BLOB>,1:32]
[@30,33:33='l',<BLOB>,1:33]
[@31,34:34='e',<BLOB>,1:34]
[@32,35:35='.',<BLOB>,1:35]
[@33,36:36='c',<BLOB>,1:36]
[@34,37:37='o',<BLOB>,1:37]
[@35,38:38='m',<BLOB>,1:38]
[@36,39:39='\n',<'
'>,1:39]
[@37,40:39='<EOF>',<EOF>,2:0]
line 1:4 extraneous input ' ' expecting BLOB
line 1:7 extraneous input ' ' expecting {<EOF>, '
', BLOB}
line 1:15 extraneous input ' ' expecting {<EOF>, '
', BLOB}
line 1:22 extraneous input ':' expecting {<EOF>, '
', BLOB}

Answer 1

正如评论中已经提到的，+-. 中的 - 在你的角色 class 中被解释为范围运算符。 , 在该范围内。像这样转义：[a-zA-Z0-9@!$^%*&+\-.]+?

此外，词法分析器规则末尾的尾随 [ ... ]+? 将始终匹配单个字符。所以 [a-zA-Z0-9@!$^%*&+\-.]+? 也可以写成 [a-zA-Z0-9@!$^%*&+\-.]

为什么逗号“，”在antlr lexer中的[.]类型表达式中被计算在内

why does a comma "," get counted in [.] type expression in antlr lexer

compiler-construction

bash

parsing

antlr

lexer