在 Menhir 中,如果没有运算符,是否可以使规则左结合?

In Menhir is it possible to make a rule left-associative if it does not have an operator?

我正在尝试使用 Menhir 为正则表达式语言编写解析器。在我修改它以消除歧义之前,我想要的语法看起来有点像下面的例子。请注意,"sequencing / concatenation" 是隐式的,没有与该操作关联的标记。

%token LPAREN RPAREN
%token CHAR STAR PIPE
%token EOF

%start <unit> parse

%%

parse: re EOF {()}

re:
  | LPAREN re RPAREN {()}  (* Grouping *)
  | CHAR             {()}  (* Single character *)
  | re STAR          {()}  (* Kleene star *)
  | re re            {()}  (* Sequencing / Concatenation *)
  | re PIPE re       {()}  (* Alternation *)

如果我有一个用于连接的标记,我将能够通过使用优先声明来消除歧义

%token LPAREN RPAREN
%token CHAR STAR PIPE
%token CONCAT
%token EOF

%left PIPE
%left CONCAT
%nonassoc STAR

%start <unit> parse

%%

parse: re EOF {()}

re:
  | LPAREN re RPAREN   {()}  (* Grouping *)
  | CHAR               {()}  (* Single character *)
  | re STAR            {()}  (* Kleene star *)
  | re CONCAT re       {()}  (* Sequencing / Concatenation *)
  | re PIPE re         {()}  (* Alternation *)

但是,如果串联规则中没有 CONCAT 标记,我将无法正常工作。我尝试使用 %prec 声明,但仍然存在一些 shift/reduce 冲突。

%token LPAREN RPAREN
%token CHAR STAR PIPE
%token CONCAT
%token EOF

%left PIPE
%left CONCAT
%nonassoc STAR

%start <unit> parse

%%

parse: re EOF {()}

re:
  | LPAREN re RPAREN   {()}  (* Grouping *)
  | CHAR               {()}  (* Single character *)
  | re STAR            {()}  (* Kleene star *)
  | re re %prec CONCAT {()}  (* Sequencing / Concatenation *)
  | re PIPE re         {()}  (* Alternation *)

我认为这可能是因为 menhir 无法判断顺序应该是左关联的,但我不能 100% 确定这是否是问题的原因。

到目前为止,我能找到的唯一解决方案涉及将 re 规则分解成一堆不同的规则,使优先级和关联性明确:

%token LPAREN RPAREN
%token CHAR STAR PIPE
%token EOF

%start <unit> parse

%%

parse: re EOF {()}

re: re3 {()}

re0:
  | LPAREN re RPAREN {()}  (* Grouping *)
  | CHAR             {()}  (* Single character *)

re1:
  | re0              {()}
  | re0 STAR         {()}  (* Kleene star *)

re2:
  | re1              {()}
  | re2 re1          {()}  (* Sequencing / Concatenation *)

re3:
  | re2              {()}
  | re3 PIPE re2     {()}  (* Alternation *)

虽然最后一个示例工作正常,但我真的很好奇是否可以仅通过使用优先级和关联性声明来消除所有歧义和冲突,而无需重写语法。

嗯,首先,这不完全是 Menhir 的问题,而是 Menhir 接受的那种语法,它位于 LR(1) 集中。如果您提供的语法不需要优先级注释,则该语法被视为SLR(1)LR(1).

的子集

你的最后一个例子之所以有效,是因为你对每个优先级都有产生式(比如解析表达式语法,它们本质上是明确的),而且绝对是一种不错的编写方式;一些现代编译器使用这种表示法来处理更复杂的语言。

为了理解您的问题,让我们首先请 Menhir 向我们解释冲突所在:

menhir --explain parser.mly

它将生成一个 parser.conflicts 文件,其中解释了在哪些状态下它可以采取两种操作,reduce 和 shift:

...
** In state 8, looking ahead at STAR, shifting is permitted
** because of the following sub-derivation:

re re
   re . STAR

** In state 8, looking ahead at STAR, reducing production
** re -> re re
** is permitted because of the following sub-derivation:

re STAR // lookahead token appears
re re .

** Conflict (shift/reduce) in state 7.
** Tokens involved: STAR PIPE LPAREN CHAR
** The following explanations concentrate on token STAR.
** This state is reached from parse after reading:

re PIPE re

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

parse
re EOF
(?)

LR(1) 的语法确实有歧义,所以:

CHAR CHAR STAR

可以计算为:

  • (CHAR CHAR) STAR
  • CHAR (CHAR STAR)

另一种无冲突地重写语法的方法是通过 list:

实现连接
re:
| term PIPE re
| term { } (* Left associativity *)

term:
| list(base STAR* { }) { } (* Concatenation is taken by list *)

base:
| CHAR
| LPAREN re RPAREN { }