antlr4 中的重复模式匹配

Repeating Pattern Matching in antlr4

我正在尝试编写一个匹配以下字符串的词法分析器规则 一个 一个 啊啊 bbbb

这里的要求是所有字符必须相同

我尝试使用这个规则: REPEAT_CHARS: ([a-z])(\1)*

但是\1在antlr4中无效。是否可以为此提出一个模式?

你不能在 ANTLR 词法分析器中这样做。至少,语法中没有目标特定代码。将代码放在语法中是你不应该做的事情(这会让人难以阅读,并且语法与该语言相关)。最好在 listener 或访客中进行此类 checks/validations。

像反向引用和环视这样的东西是隐藏在编程语言的正则表达式引擎中的特性。 ANTLR 中可用的正则表达式语法(以及我所知道的所有解析器生成器)不支持这些功能,但确实如此 regular languages.

Many features found in virtually all modern regular expression libraries provide an expressive power that far exceeds the regular languages. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (backreferences). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called squares in formal language theory.

-- https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages