不清楚 yacc/bison 生产规范如何导致堆栈溢出

Question

这不是作业，而是来自一本书。我得到以下语法：

%{
#include <stdio.h>
#include <ctype.h>

int yylex();
int yyerror();
%}

%%

command : exp '\n' { printf("%d\n", ); exit(0); }
        | error '\n'
          { 
            yyerrok;
            printf("reenter expression: "); 
          }
          command
        ;

exp : exp '+' term { $$ =  + ; }
    | exp '-' term { $$ =  - ; }
    | term { $$ = ; }
    ;

term : term '*' factor { $$ =  * ; }
     | factor { $$ = ; }
     ;

factor : NUMBER { $$ = ; }
       | '(' exp ')' { $$ = ; }
       ;

%%

int main() {
  return yyparse();
}

int yylex() {
  int c;

  /* eliminate blanks*/
  while((c = getchar()) == ' ');

  if (isdigit(c)) {
    ungetc(c, stdin);
    scanf("%d\n", &yylval);
    return (NUMBER);
  }

  /* makes the parse stop */
  if (c == '\n') return 0;

  return (c);
}

int yyerror(char * s) {
  fprintf(stderr, "%s\n", s);
  return 0;
} /* allows for printing of an error message */

这是任务：

The simple error recovery technique suggested for the calculator program is flawed in that it could cause stack overflow after many errors. Rewrite it to remove this problem.

我真的搞不懂堆栈溢出是怎么发生的。考虑到开始生产是唯一一个有错误标记的生产，难道不会 yacc/bison 在重新启动之前弹出堆栈上的所有元素吗？

Answer 1

如有疑问，最简单的方法就是使用 bison。

为了避免错误，我稍微修改了程序。首先，由于新程序依赖于查看 '\n' 个标记，我删除了 if (c == '\n') return 0; 行，它会禁止发送 '\n'。其次，我将 scanf("%d\n", &yylval); 固定为 scanf("%d", &yylval);。没有理由吞下数字后面的空格，特别是 如果数字后面的空格是换行符。（然而，scanf 模式不区分不同种类的空格，因此模式 "%d\n" 与 "%d " 具有完全相同的语义。这两个都不正确。）

然后我在 main 的顶部添加了 yydebug = 1; 行，并在构建计算器时向 bison 提供了 -t ("trace") 选项。这会导致解析器在处理输入时详细显示其进度。

获取状态 table 转储有助于查看发生了什么。您可以使用 -v bison 选项来做到这一点。不过，我会把它留给读者。

然后我运行程序故意打了一个语法错误：

./error
Starting parse
Entering state 0
Reading a token: 2++3

trace 工具已经输出了两行，但是在我给它一些输入之后，trace 就倾泻而出。

首先解析器吸收了NUMBER2和运算符+：（注：下面的nterm是bison的说法"non-terminal"，而token 是一个 "terminal"；堆栈只显示状态编号。）

Next token is token NUMBER ()
Shifting token NUMBER ()
Entering state 2
Reducing stack by rule 9 (line 25):
    = token NUMBER ()
-> $$ = nterm factor ()
Stack now 0
Entering state 7
Reducing stack by rule 8 (line 22):
    = nterm factor ()
-> $$ = nterm term ()
Stack now 0
Entering state 6
Reading a token: Next token is token '+' ()
Reducing stack by rule 6 (line 18):
    = nterm term ()
-> $$ = nterm exp ()
Stack now 0
Entering state 5
Next token is token '+' ()
Shifting token '+' ()
Entering state 12

到目前为止，还不错。状态 12 是我们在看到 + 之后到达的地方；这是它的定义：

State 12

    4 exp: exp '+' . term
    7 term: . term '*' factor
    8     | . factor
    9 factor: . NUMBER
   10       | . '(' exp ')'

    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

    term    go to state 17
    factor  go to state 7

（默认情况下，bison 不会将状态 table 与非核心项目混为一谈。我添加了 -r itemset 以获得完整的项目集，但这很容易做到手动关闭。）

因为在这种状态下我们正在寻找 + 的右手操作运行d，所以只有可以开始表达式的东西才有效：NUMBER 和 (。但这不是我们所拥有的：

Reading a token: Next token is token '+' ()
syntax error

好的，我们处于状态 12，如果您查看上面的状态描述，您会发现 error 也不在前瞻集中。所以：

Error: popping token '+' ()
Stack now 0 5

这让我们回到了状态 5，这是预期的操作员：

State 5

    1 command: exp . '\n'
    4 exp: exp . '+' term
    5    | exp . '-' term

    '\n'  shift, and go to state 11
    '+'   shift, and go to state 12
    '-'   shift, and go to state 13

因此该州在 error 上也没有 t运行sition。以后。

Error: popping nterm exp ()
Stack now 0

OK，回到开头。状态 0 是否具有 error t运行位置：

   error   shift, and go to state 1

所以现在我们可以移动 error 令牌并进入状态 1，如 t运行位置 table:

所示

Shifting token error ()
Entering state 1

现在我们需要通过跳过输入标记来同步输入，直到我们到达换行标记。（请注意，野牛在执行此操作时实际上会弹出并推送错误标记。尽量不要让它分散您的注意力。）

Next token is token '+' ()
Error: discarding token '+' ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token NUMBER ()
Error: discarding token NUMBER ()
Error: popping token error ()
Stack now 0
Shifting token error ()
Entering state 1
Reading a token: Next token is token '\n' ()
Shifting token '\n' ()
Entering state 8

好的，我们找到换行符了。状态 5 是 command: error '\n' . $@1 command。 $@1 是 bison 插入的标记（空生产）的名称，以代替中间规则操作 (MRA)。状态 8 将减少此标记，导致 MRA 变为运行，这要求我提供更多输入。请注意，此时错误恢复已完成。我们现在处于完全正常的状态，堆栈反映了这样一个事实，即我们按顺序拥有开始（状态 0）、error 标记（状态 1）和换行标记（状态 8）：

Reducing stack by rule 2 (line 13):
-> $$ = nterm $@1 ()
Stack now 0 1 8
Entering state 15
Reading a token: Try again:

MRA 减少后，从状态 8 开始采取相应的操作，我们继续进行状态 15（为了避免混乱，我省略了非核心项目）：

State 15

    3 command: error '\n' $@1 . command

    error   shift, and go to state 1
    NUMBER  shift, and go to state 2
    '('     shift, and go to state 3

所以现在我们已经准备好解析 b运行d 新命令，正如预期的那样。但是我们还没有减少错误的产生；它仍然在堆栈中，因为在减少点后面的 command 之前它不能被减少。我们甚至还没有开始。

但重要的是要注意州 15 确实在 error 上有一个 t运行sition，正如您从州的 goto [=124] 中看到的那样=].它有 t运行sition 因为闭包包括 command:

的两个产生式

    1 command: . exp '\n'
    3        | . error '\n' $@1 command

以及 exp、term 和 factor 的作品，它们也是闭包的一部分。

那么如果我们现在输入另一个错误会怎样？堆栈将被弹出回到这一点 (0 1 8 15)，一个新的 error 令牌将被压入堆栈 (0 1 8 15 1)，令牌将被丢弃直到可以换行 ( 0 1 8 15 1 8) 和一个新的 MRA（$@1，野牛称之为）将被缩减到堆栈中（0 1 8 15 1 8 15），此时我们准备开始解析另一次尝试。

希望你能看到这是怎么回事。

请注意，它与任何其他右递归产生式的效果确实没有什么不同。如果语法尝试接受多个表达式：

prog: exp '\n'
    | exp '\n' { printf("%d\n", ); } prog

您会看到相同的堆栈堆积，这就是不鼓励右递归的原因。（并且还因为您最终插入 MRA 以避免在所有输入结束时堆栈减少到 prog 时以相反的顺序看到结果。）

    command  go to state 20
    exp      go to state 5
    term     go to state 6
    factor   go to state 7

不清楚 yacc/bison 生产规范如何导致堆栈溢出

Unclear how a yacc/bison production spec can cause a stack overflow

parsing

yacc

bison