Lex 和 Yacc 来制作编译器？

Question

我正在开始一个玩具编译器，我正在做我能想象到的最简单的东西，但它行不通。

Lex 编译，Yacc 编译，它们 link 在一起，但输出的程序没有达到我的预期。

莱克斯：

%{
#include <stdlib.h>
void yyerror(char *);
#include "y.tab.h"
%}

%%


a                       { 
                            yylval = atoi(yytext);
                            return AAA;
                        }
.                       yyerror("invalid character");

%%
int yywrap(void) {
 return 1;
}

Yacc:

%{
    void yyerror(char *);
    int yylex(void);
    int sym[26];
    #include <stdio.h>
%}

%token AAA

%%
daaaa:
AAA             {printf("%d\n", );}

%%

void yyerror(char *s) {
 fprintf(stderr, "%s\n", s);
}

int main(void) {
 yyparse();
 return 0;
}

我试图用这个编译器编译的程序是一个包含：a 的文件。就是这样。

不知道怎么回事！

澄清：我希望编译器做的是接受一个文件，处理文件，并吐出该文件的编译版本。

Answer 1

程序不接受文件，因为它没有被告知。

在Yacc程序中， extern FILE *yyin; 必须添加到 definitions 部分。

我相信就是这样。

Answer 2

Can you explain, maybe in an answer, exactly what you did, and how it worked, because as far as I can tell, and as far as I have tested the question, it shouldn't work as you say.

我逐字记录了您的代码，创建了文件 grammar.y 和 lexer.l。然后我编译了代码。我正在开发 Mac OS X 10.11.4，使用 GCC 6.1.0、Bison 2.3（伪装成 yacc）和 Flex 2.5.35（伪装成 lex ).

$ yacc -d grammar.y
$ lex lexer.l
$ gcc -o gl y.tab.c lex.yy.c
$ ./gl <<< 'a'
0

$

我随后做了两处改动。在 grammar.y 中，我将 main() 更改为：

int main(void) {
 #if YYDEBUG
 yydebug = 1;
 #endif
 yyparse();
 return 0;
}

并且在 lexer.l 中，我将默认字符规则更改为：

\n|.                    yyerror("invalid character");

（.不匹配换行符，所以输入中a之后的换行符在原始输出中默认回显。）

用类似的编译，输出变成：

$ ./gl <<< 'a'
0
invalid character
$

编译也指定 -DYYDEBUG：

$ gcc -DYYDEBUG -o gl lex.yy.c y.tab.c
$

输出包含有用的调试信息：

$ ./gl <<< 'a'
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
    = token AAA ()
0
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: invalid character
Now at end of input.
Stack now 0 2
Cleanup: popping nterm daaaa ()
$ ./gl <<< 'aa'
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
    = token AAA ()
0
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: Next token is token AAA ()
syntax error
Error: popping nterm daaaa ()
Stack now 0
Cleanup: discarding lookahead token AAA ()
Stack now 0
$

输入中的第二个 a 正确触发语法错误（语法不允许）。允许其他字符，生成 'invalid character' 消息，否则将被忽略（因此 ./gl <<< 'abc' 生成 3 个无效字符消息，一个用于 b，一个用于 c，以及一个换行符）。

将 lexer.l 中对 yylval 的分配更改为：

yylval = 'a'; // atoi(yytext);

将打印的数字从0改为97，即ASCII、ISO 8859-1、Unicode等'a'的字符编码

我一直在使用 here string 作为数据源。使用文件作为输入同样可行：

$ echo a > program
$ cat program
a
$ ./gl < a
Starting parse
Entering state 0
Reading a token: Next token is token AAA ()
Shifting token AAA ()
Entering state 1
Reducing stack by rule 1 (line 12):
    = token AAA ()
97
-> $$ = nterm daaaa ()
Stack now 0
Entering state 2
Reading a token: invalid character
Now at end of input.
Stack now 0 2
Cleanup: popping nterm daaaa ()
$

如果要在命令行中读取指定名称的文件，则必须在 main() 中编写更多代码来处理这些文件。

Lex 和 Yacc 来制作编译器？

Lex and Yacc to make compiler?

yacc

lex