python 中的 Antlr 4 未按预期工作(尝试解析一本书的章节和段落)
Antlr 4 in python not working as expected (trying to parse chapter and paragraph of a book)
我想创建一个非常简单的 ANTLR4 解析器(在 Python 中),没有侦听器和访问器,它以任何顺序将一本书的章节和段落作为输入,returns high_level
(章节)和 low_level
(段落)的条目,例如如果我输入 2 a
或 a 2
,它应该打印“第 2 章,段落 a”。
这是我的Example.g4
grammar Example;
text
: paragraph ;
paragraph
: high_level (WS low_level)?
| low_level WS high_level
;
low_level
: 'a' | 'b' | 'c' ;
high_level
: '1' | '2' | '3';
WS : [ \t\r\n]+ ;
我在我的终端上这样做
java -jar ~/antlr-4.8-complete.jar -Dlanguage=Python3 -no-listener -no-visitor Example.g4
生成两个python文件,然后我写了下面的python脚本
from antlr4 import *
from ExampleLexer import ExampleLexer
from ExampleParser import ExampleParser
def main():
while True:
text = InputStream(input(">"))
lexer = ExampleLexer(text)
stream = CommonTokenStream(lexer)
parser = ExampleParser(stream)
tree = parser.text()
query = tree.paragraph()
low_level = query.low_level()
high_level = query.high_level()
print(f"chapter {high_level}, paragraph {low_level}")
if __name__ == '__main__':
main()
但是,如果我然后 运行 它并输入 2 a
,我得到这个
chapter [10 8], paragraph [12 8]
谁能解释一下我做错了什么?我不明白方括号中的数字。
它只是 RuleContext
显示的一些调试信息(您生成的 Low_levelContext
和 High_levelContext
类 从中扩展)。在您的情况下,显示规则的 invokingState
和 parentCtx
。
查看源代码:
class RuleContext(RuleNode):
...
def __str__(self):
return self.toString(None, None)
...
def toString(self, ruleNames:list, stop:RuleContext)->str:
with StringIO() as buf:
p = self
buf.write("[")
while p is not None and p is not stop:
if ruleNames is None:
if not p.isEmpty():
buf.write(str(p.invokingState))
else:
ri = p.getRuleIndex()
ruleName = ruleNames[ri] if ri >= 0 and ri < len(ruleNames) else str(ri)
buf.write(ruleName)
if p.parentCtx is not None and (ruleNames is not None or not p.parentCtx.isEmpty()):
buf.write(" ")
p = p.parentCtx
buf.write("]")
return buf.getvalue()
...
https://github.com/antlr/antlr4/blob/master/runtime/Python3/src/antlr4/RuleContext.py
你没有解释你想显示什么,但我猜是规则匹配的文本,在这种情况下你可以这样做:
print(f"chapter {high_level.getText()}, paragraph {low_level.getText()}")
我想创建一个非常简单的 ANTLR4 解析器(在 Python 中),没有侦听器和访问器,它以任何顺序将一本书的章节和段落作为输入,returns high_level
(章节)和 low_level
(段落)的条目,例如如果我输入 2 a
或 a 2
,它应该打印“第 2 章,段落 a”。
这是我的Example.g4
grammar Example;
text
: paragraph ;
paragraph
: high_level (WS low_level)?
| low_level WS high_level
;
low_level
: 'a' | 'b' | 'c' ;
high_level
: '1' | '2' | '3';
WS : [ \t\r\n]+ ;
我在我的终端上这样做
java -jar ~/antlr-4.8-complete.jar -Dlanguage=Python3 -no-listener -no-visitor Example.g4
生成两个python文件,然后我写了下面的python脚本
from antlr4 import *
from ExampleLexer import ExampleLexer
from ExampleParser import ExampleParser
def main():
while True:
text = InputStream(input(">"))
lexer = ExampleLexer(text)
stream = CommonTokenStream(lexer)
parser = ExampleParser(stream)
tree = parser.text()
query = tree.paragraph()
low_level = query.low_level()
high_level = query.high_level()
print(f"chapter {high_level}, paragraph {low_level}")
if __name__ == '__main__':
main()
但是,如果我然后 运行 它并输入 2 a
,我得到这个
chapter [10 8], paragraph [12 8]
谁能解释一下我做错了什么?我不明白方括号中的数字。
它只是 RuleContext
显示的一些调试信息(您生成的 Low_levelContext
和 High_levelContext
类 从中扩展)。在您的情况下,显示规则的 invokingState
和 parentCtx
。
查看源代码:
class RuleContext(RuleNode):
...
def __str__(self):
return self.toString(None, None)
...
def toString(self, ruleNames:list, stop:RuleContext)->str:
with StringIO() as buf:
p = self
buf.write("[")
while p is not None and p is not stop:
if ruleNames is None:
if not p.isEmpty():
buf.write(str(p.invokingState))
else:
ri = p.getRuleIndex()
ruleName = ruleNames[ri] if ri >= 0 and ri < len(ruleNames) else str(ri)
buf.write(ruleName)
if p.parentCtx is not None and (ruleNames is not None or not p.parentCtx.isEmpty()):
buf.write(" ")
p = p.parentCtx
buf.write("]")
return buf.getvalue()
...
https://github.com/antlr/antlr4/blob/master/runtime/Python3/src/antlr4/RuleContext.py
你没有解释你想显示什么,但我猜是规则匹配的文本,在这种情况下你可以这样做:
print(f"chapter {high_level.getText()}, paragraph {low_level.getText()}")