如何在 Bison 中使用 "literal string tokens"

Question

我在学习Flex/Bison。 Bison的手册说：

A literal string token is written like a C string constant; for example, "<=" is a literal string token. A literal string token doesn’t need to be declared unless you need to specify its semantic value data type

但我不知道如何使用它，也没有找到示例。

我有以下测试代码：

example.l

%option noyywrap nodefault

%{
#include "example.tab.h"
%}

%%

[ \t\n] {;}
[0-9] { return NUMBER; }
. { return yytext[0]; }

%%

example.y

%{
#include <stdio.h>
#define YYSTYPE char *
%}

%token NUMBER

%%

start: %empty | start tokens

tokens:
       NUMBER "<=" NUMBER { printf("<="); }
     | NUMBER "=>" NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }

%%

main(int argc, char **argv) {
   yyparse();
}

yyerror(char *s) {
   fprintf(stderr, "error: %s\n", s);
}

生成文件

#!/usr/bin/make
# by RAM

all: example

example.tab.c example.tab.h: example.y
    bison -d $<

lex.yy.c: example.l example.tab.h
    flex $<

example: lex.yy.c example.tab.c
    cc -o $@ example.tab.c lex.yy.c -lfl

clean:
    rm -fr example.tab.c example.tab.h lex.yy.c example

当我运行它：

$ ./example 
3<4
<
6>9
>
6=>9
error: syntax error

有什么想法吗？

更新：我想澄清一下，我知道解决它的替代方法，但我想使用文字字符串标记。

一个选择：使用多个 "literal character tokens":

tokens:
       NUMBER '<' '=' NUMBER { printf("<="); }
     | NUMBER '=' '>' NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }

当我运行它时：

$ ./example 
3<=9
<=

其他选择：

在example.l:

"<="  { return LE; }
"=>"  { return GE; }

在example.y:

...
%token NUMBER
%token LE "<="
%token GE "=>"

%%

start: %empty | start tokens

tokens:
       NUMBER "<=" NUMBER { printf("<="); }
     | NUMBER "=>" NUMBER { printf("=>\n"); }
     | NUMBER '>' NUMBER { printf(">\n"); }
     | NUMBER '<' NUMBER { printf("<\n"); }
...

当我运行它时：

$ ./example 
3<=4
<=

但是手册上说：

A literal string token doesn’t need to be declared unless you need to specify its semantic value data type

Answer 1

我有一段时间没用 flex/bison 但有两件事：

. 据我记得只匹配一个字符。 yytext 是指向空终止字符串 char* 的指针，因此 yytext[0] 是 char，这意味着您不能以这种方式匹配字符串。您可能需要将其更改为 return yytext。否则 . 可能会创建一个标记 PER 字符，您可能必须编写 NUMBER '<' '=' NUMBER.

Answer 2

引用的手册段落是正确的，但您还需要阅读下一段：

You can associate the literal string token with a symbolic name as an alias, using the %token declaration (see Token Declarations). If you don’t do that, the lexical analyzer has to retrieve the token number for the literal string token from the yytname table.

所以你不需要声明字面量字符串标记，但你仍然需要安排词法分析器发送正确的标记号，如果你不声明关联的标记名称，唯一的方法是找到正确的值是在yytnametable.

中搜索代码

简而言之，您将 LE 和 GE 定义为别名的最后一个示例是迄今为止最常见的方法。将标记分离成单个字符不是一个好主意；它可能会产生 shift-reduce 冲突，并且肯定会允许无效输入，例如在字符之间放置空格。

如果您想尝试 yytname 解决方案，可以使用 sample code in the bison manual。但请注意，此代码发现了 bison 的 internal 令牌编号，这不是需要从扫描仪返回的编号。没有办法获得外部令牌号，这很容易，portable 并记录在案；简单且未记录的方法是在 yytoknum 中查找令牌编号，但由于该数组未记录且以预处理器宏为条件，因此无法保证它会起作用。另请注意，这些 table 被声明为 static 因此依赖它们的函数必须包含在 bison 输入文件中。（当然，这些函数可以有外部链接，以便可以从词法分析器中调用它们。但是你不能直接在词法分析器中使用 yytname。）

如何在 Bison 中使用 "literal string tokens"

How to use "literal string tokens" in Bison

token

bison