使用 flex 和 bison 进行乘法解析
Multiply parse using flex and bison
我在 C++ database query engine project
工作。此时我应该能够解析为 schema sql.data (如 create table
)和 query sql.data(如 select ... from ...
)。所以每个 sql 我有 2 个解析器。
我遇到的问题:我只能使用解析器一次,而且效果很好。如果我尝试同时解析 schema 和 query,那么我会遇到以下冲突:
(这只是即将到来的冲突的一部分,但我认为这可能是主要原因和主要冲突)
> ninja
[9/10] Linking CXX executable imlabdb
FAILED: imlabdb
: && /usr/bin/c++ -g -O0 -fsanitize=address CMakeFiles/imlabdb.dir/tools/imlabdb.cc.o -o imlabdb libimlab.a libschema.a libquery.a vendor/gflags/lib/libgflags.a -pthread && :
libquery.a(query_scanner.cc.o):(.bss+0x140): multiple definition of `yyleng'
libschema.a(schema_scanner.cc.o):(.bss+0x140): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x280): multiple definition of `yyin'
libschema.a(schema_scanner.cc.o):(.bss+0x280): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x2c0): multiple definition of `yyout'
libschema.a(schema_scanner.cc.o):(.bss+0x2c0): first defined here
libquery.a(query_scanner.cc.o):(.data+0x0): multiple definition of `yylineno'
libschema.a(schema_scanner.cc.o):(.data+0x0): first defined here
libquery.a(query_scanner.cc.o):(.data+0x40): multiple definition of `yy_flex_debug'
libschema.a(schema_scanner.cc.o):(.data+0x40): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x380): multiple definition of `yytext'
libschema.a(schema_scanner.cc.o):(.bss+0x380): first defined here
如您所见,我使用 ninja 进行编译。我认为local.make
和CMakeList.txt
没有问题,所以我就跳过这里展示了。
我尝试以简洁的方式展示我的代码。
imlabdb.cc
int main(int argc, char *argv[]) {
imlab::schemac::SchemaParseContext schema_parse_context;
std::ifstream in_schema("../data/schema.sql"); // schema sql
auto schema = schema_parse_context.Parse(in_schema);
in_schema.close();
imlab::queryc::QueryParseContext query_parse_context;
std::ifstream in_query("../data/queryc_2.sql"); // query sql
auto query = query_parse_context.Parse(in_query);
in_query.close();
}
schema_parse_context.cc
Schema SchemaParseContext::Parse(std::istream &in) {
beginScan(in);
imlab::schemac::SchemaParser parser(*this);
parser.set_debug_level(trace_parsing_);
parser.parse();
endScan();
return {mySchema}; // a container for the create table nokens
}
query_parse_context.cc
Query QueryParseContext::Parse(std::istream &in) {
beginScan(in);
imlab::queryc::QueryParser parser(*this);
parser.set_debug_level(trace_parsing_);
parser.parse();
endScan();
return {myQuery}; // a container for querys
}
然后我显示架构的flex and bison
。
schema_scanner.l 没有最不必要的标记。
%{
// Header
#include <cerrno>
#include <climits>
#include <cstdlib>
#include <string>
#include <istream>
#include "imlab/schemac/schema_parse_context.h"
#include "./schema_parser.h"
namespace imlab {
namespace schemac {
// The location of the current token
extern imlab::schemac::location loc;
// The input stream of the scanner
extern std::istream *in;
} // namespace schemac
} // namespace imlab
using namespace imlab::schemac;
// Work around an incompatibility in flex (at least versions
// 2.5.31 through 2.5.33): it generates code that does
// not conform to C89. See Debian bug 333231
// <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.
#undef yywrap
#define yywrap() 1
// Declare the yylex function
#define YY_DECL SchemaParser::symbol_type yylex(SchemaParseContext& sc)
// Configure the scanner to use istreams
#define YY_INPUT(buffer, result, max_size) \
result = 0; \
while (true) { \
int c = in->get(); \
if (in->eof()) break; \
buffer[result++] = c; \
if (result == max_size || c == '\n') break; \
}
%}
%{
// ---------------------------------------------------------------------------------------------------
// Options
// ---------------------------------------------------------------------------------------------------
%}
%{
// noyywrap: Disable yywrap (EOF == end of parsing)
// nounput: Disable manipulation of input stream
// noinput: Disable explicit fetch of the next character
// batch: Scanner in batch-mode (vs. interactive)
// debug: Write debug info to stderr
// caseless: Case-insensitive pattern matching
%}
%option noyywrap
%option nounput
%option noinput
%option batch
%option debug
%option caseless
%{
// Code run each time a token is matched.
// We just update the location of the token.
#define YY_USER_ACTION { loc.columns(yyleng); }
%}
%%
%{
// Code runs each time yylex is called.
// Set the beginning of the token to the end of the previous token.
loc.step ();
%}
[ \t]+ { loc.step(); }
"\n" { loc.lines (yyleng); loc.step (); }
";" { return SchemaParser::make_SEMICOLON(loc); }
%%
// ---------------------------------------------------------------------------------------------------
// Code
// ---------------------------------------------------------------------------------------------------
// The input stream
imlab::schemac::location imlab::schemac::loc;
// The input stream of the scanner
std::istream *imlab::schemac::in = nullptr;
// Begin a scan
void imlab::schemac::SchemaParseContext::beginScan(std::istream &is) {
yy_flex_debug = trace_scanning_;
in = &is;
}
// End a scan
void imlab::schemac::SchemaParseContext::endScan() {
in = nullptr;
}
schema_parser.y 也删除了不必要的标记和案例
%skeleton "lalr1.cc"
%require "3.0.4"
// ---------------------------------------------------------------------------------------------------
// Write a parser header file
%defines
// Define the parser class name
%define parser_class_name {SchemaParser}
// Create the parser in our namespace
%define api.namespace { imlab::schemac }
// Use C++ variant to store the values and get better type warnings (compared to "union")
%define api.value.type variant
// With variant-based values, symbols are handled as a whole in the scanner
%define api.token.constructor
// Prefix all tokens
%define api.token.prefix {SCHEMA_}
// Check if variants are constructed and destroyed properly
%define parse.assert
// Trace the parser
%define parse.trace
// Use verbose parser errors
%define parse.error verbose
// Enable location tracking.
%locations
// Pass the compiler as parameter to yylex/yyparse.
%param { imlab::schemac::SchemaParseContext &sc }
// ---------------------------------------------------------------------------------------------------
// Added to the header file and parser implementation before bison definitions.
// We include string for string tokens and forward declare the SchemaParseContext.
%code requires {
#include <string>
#include <vector>
#include "imlab/schemac/schema_parse_context.h"
}
// ---------------------------------------------------------------------------------------------------
// Import the compiler header in the implementation file
%code {
imlab::schemac::SchemaParser::symbol_type yylex(imlab::schemac::SchemaParseContext& sc);
}
%code {
std::string insertTableId;
int positionToInsert;
}
// ---------------------------------------------------------------------------------------------------
// Token definitions but deleted the most of them
%token <int> INTEGER_VALUE "integer_value"
%token <std::string> IDENTIFIER "identifier"
// ---------------------------------------------------------------------------------------------------
// Define error function
void imlab::schemac::SchemaParser::error(const location_type& l, const std::string& m) {
sc.Error(l.begin.line, l.begin.column, m);
}
现在 flex and bison
为 query
,但几乎是用同样的方式写的。
query_scanner.l
%{
// Header
#include <cerrno>
#include <climits>
#include <cstdlib>
#include <string>
#include <istream>
#include "imlab/queryc/query_parse_context.h"
#include "./query_parser.h"
namespace imlab {
namespace queryc {
// The location of the current token
extern imlab::queryc::location loc;
// The input stream of the scanner
extern std::istream *in;
} // namespace queryc
} // namespace imlab
using namespace imlab::queryc;
// Work around an incompatibility in flex (at least versions
// 2.5.31 through 2.5.33): it generates code that does
// not conform to C89. See Debian bug 333231
// <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.
#undef yywrap
#define yywrap() 1
// Declare the yylex function
#define YY_DECL QueryParser::symbol_type yylex(QueryParseContext& sc)
// Configure the scanner to use istreams
#define YY_INPUT(buffer, result, max_size) \
result = 0; \
while (true) { \
int c = in->get(); \
if (in->eof()) break; \
buffer[result++] = c; \
if (result == max_size || c == '\n') break; \
}
%}
%{
// ---------------------------------------------------------------------------------------------------
// Options
// ---------------------------------------------------------------------------------------------------
%}
%option noyywrap
%option nounput
%option noinput
%option batch
%option debug
%option caseless
%{
#define YY_USER_ACTION { loc.columns(yyleng); }
%%
%{
loc.step ();
%}
[ \t]+ { loc.step(); }
"\n" { loc.lines (yyleng); loc.step (); }
";" { return QueryParser::make_SEMICOLON(loc); }
%%
// ---------------------------------------------------------------------------------------------------
// Code
// ---------------------------------------------------------------------------------------------------
// The input stream
imlab::queryc::location imlab::queryc::loc;
// The input stream of the scanner
std::istream *imlab::queryc::in = nullptr;
// Begin a scan
void imlab::queryc::QueryParseContext::beginScan(std::istream &is) {
yy_flex_debug = trace_scanning_;
in = &is;
}
// End a scan
void imlab::queryc::QueryParseContext::endScan() {
in = nullptr;
}
query_parser.y
%skeleton "lalr1.cc"
%require "3.0.4"
// ---------------------------------------------------------------------------------------------------
%defines
%define parser_class_name {QueryParser}
%define api.namespace { imlab::queryc }
%define api.value.type variant
%define api.token.constructor
%define api.token.prefix {QUERY_}
%define parse.assert
%define parse.trace
%define parse.error verbose
%locations
%param { imlab::queryc::QueryParseContext &sc }
// ---------------------------------------------------------------------------------------------------
%code requires {
#include <string>
#include <vector>
#include "imlab/queryc/query_parse_context.h"
}
// ---------------------------------------------------------------------------------------------------
%code {
imlab::queryc::QueryParser::symbol_type yylex(imlab::queryc::QueryParseContext& sc);
}
// %code {
// std::string insertTableId;
// int positionToInsert;
// }
// ---------------------------------------------------------------------------------------------------
// Token definitions -- most of them deleted
%token <int> INTEGER_VALUE "integer_value"
%token <std::string> IDENTIFIER "identifier"
// ---------------------------------------------------------------------------------------------------
// Define error function
void imlab::queryc::QueryParser::error(const location_type& l, const std::string& m) {
sc.Error(l.begin.line, l.begin.column, m);
}
// ---------------------------------------------------------------------------------------------------
简而言之:
我可以确定的是:如果我只在 imlabdb.cc
中使用其中一个,这两个 flex and bison
和 imlabdb.cc
之类的其他东西运行良好.我添加到顶部的冲突只会在我尝试同时使用两个解析器时发生。
我想要的: 同时使用两个解析器,避免命名冲突。
谢谢!
(如果我删了太多代码或者当前代码不够用,请post。我会修复它。)
模式 (DDL) 和查询 (DML) 都是 SQL 语言的组成部分。您不需要有两个 lexers/parsers,只需一个基于 SQL 语法定义。例如,查看“YACC SQL Grammar Reference”。
尽管@serge 关于您的具体问题是正确的,但我会更笼统地回答这个问题。
Flex 和 bison 创建一个 C 文件,里面有一堆 public 的函数和变量,例如 yylex
和 yyparse
。这是不幸的,如果你在一个程序中有两个解析器或词法分析器。程序不知道要调用哪个 yylex
和 yyparse
。所以你需要给它们取不同的名字。
要在 flex 中执行此操作,您可以使用选项
%option prefix="foo"
这会将所有具有 yy 前缀的符号重命名为 foo,因此 foolex
。因此,在您的情况下,您然后为一个词法分析器使用一个前缀,为另一个词法分析器使用另一个前缀。
在 bison 上同样如此,只是语法不同。根据您使用的 bision 版本(新):
%define api.prefix {foo}
或(旧):
%name-prefix "foo"
然后将生成 fooparse
。
参考文档:
我在 C++ database query engine project
工作。此时我应该能够解析为 schema sql.data (如 create table
)和 query sql.data(如 select ... from ...
)。所以每个 sql 我有 2 个解析器。
我遇到的问题:我只能使用解析器一次,而且效果很好。如果我尝试同时解析 schema 和 query,那么我会遇到以下冲突:
(这只是即将到来的冲突的一部分,但我认为这可能是主要原因和主要冲突)
> ninja
[9/10] Linking CXX executable imlabdb
FAILED: imlabdb
: && /usr/bin/c++ -g -O0 -fsanitize=address CMakeFiles/imlabdb.dir/tools/imlabdb.cc.o -o imlabdb libimlab.a libschema.a libquery.a vendor/gflags/lib/libgflags.a -pthread && :
libquery.a(query_scanner.cc.o):(.bss+0x140): multiple definition of `yyleng'
libschema.a(schema_scanner.cc.o):(.bss+0x140): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x280): multiple definition of `yyin'
libschema.a(schema_scanner.cc.o):(.bss+0x280): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x2c0): multiple definition of `yyout'
libschema.a(schema_scanner.cc.o):(.bss+0x2c0): first defined here
libquery.a(query_scanner.cc.o):(.data+0x0): multiple definition of `yylineno'
libschema.a(schema_scanner.cc.o):(.data+0x0): first defined here
libquery.a(query_scanner.cc.o):(.data+0x40): multiple definition of `yy_flex_debug'
libschema.a(schema_scanner.cc.o):(.data+0x40): first defined here
libquery.a(query_scanner.cc.o):(.bss+0x380): multiple definition of `yytext'
libschema.a(schema_scanner.cc.o):(.bss+0x380): first defined here
如您所见,我使用 ninja 进行编译。我认为local.make
和CMakeList.txt
没有问题,所以我就跳过这里展示了。
我尝试以简洁的方式展示我的代码。
imlabdb.cc
int main(int argc, char *argv[]) {
imlab::schemac::SchemaParseContext schema_parse_context;
std::ifstream in_schema("../data/schema.sql"); // schema sql
auto schema = schema_parse_context.Parse(in_schema);
in_schema.close();
imlab::queryc::QueryParseContext query_parse_context;
std::ifstream in_query("../data/queryc_2.sql"); // query sql
auto query = query_parse_context.Parse(in_query);
in_query.close();
}
schema_parse_context.cc
Schema SchemaParseContext::Parse(std::istream &in) {
beginScan(in);
imlab::schemac::SchemaParser parser(*this);
parser.set_debug_level(trace_parsing_);
parser.parse();
endScan();
return {mySchema}; // a container for the create table nokens
}
query_parse_context.cc
Query QueryParseContext::Parse(std::istream &in) {
beginScan(in);
imlab::queryc::QueryParser parser(*this);
parser.set_debug_level(trace_parsing_);
parser.parse();
endScan();
return {myQuery}; // a container for querys
}
然后我显示架构的flex and bison
。
schema_scanner.l 没有最不必要的标记。
%{
// Header
#include <cerrno>
#include <climits>
#include <cstdlib>
#include <string>
#include <istream>
#include "imlab/schemac/schema_parse_context.h"
#include "./schema_parser.h"
namespace imlab {
namespace schemac {
// The location of the current token
extern imlab::schemac::location loc;
// The input stream of the scanner
extern std::istream *in;
} // namespace schemac
} // namespace imlab
using namespace imlab::schemac;
// Work around an incompatibility in flex (at least versions
// 2.5.31 through 2.5.33): it generates code that does
// not conform to C89. See Debian bug 333231
// <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.
#undef yywrap
#define yywrap() 1
// Declare the yylex function
#define YY_DECL SchemaParser::symbol_type yylex(SchemaParseContext& sc)
// Configure the scanner to use istreams
#define YY_INPUT(buffer, result, max_size) \
result = 0; \
while (true) { \
int c = in->get(); \
if (in->eof()) break; \
buffer[result++] = c; \
if (result == max_size || c == '\n') break; \
}
%}
%{
// ---------------------------------------------------------------------------------------------------
// Options
// ---------------------------------------------------------------------------------------------------
%}
%{
// noyywrap: Disable yywrap (EOF == end of parsing)
// nounput: Disable manipulation of input stream
// noinput: Disable explicit fetch of the next character
// batch: Scanner in batch-mode (vs. interactive)
// debug: Write debug info to stderr
// caseless: Case-insensitive pattern matching
%}
%option noyywrap
%option nounput
%option noinput
%option batch
%option debug
%option caseless
%{
// Code run each time a token is matched.
// We just update the location of the token.
#define YY_USER_ACTION { loc.columns(yyleng); }
%}
%%
%{
// Code runs each time yylex is called.
// Set the beginning of the token to the end of the previous token.
loc.step ();
%}
[ \t]+ { loc.step(); }
"\n" { loc.lines (yyleng); loc.step (); }
";" { return SchemaParser::make_SEMICOLON(loc); }
%%
// ---------------------------------------------------------------------------------------------------
// Code
// ---------------------------------------------------------------------------------------------------
// The input stream
imlab::schemac::location imlab::schemac::loc;
// The input stream of the scanner
std::istream *imlab::schemac::in = nullptr;
// Begin a scan
void imlab::schemac::SchemaParseContext::beginScan(std::istream &is) {
yy_flex_debug = trace_scanning_;
in = &is;
}
// End a scan
void imlab::schemac::SchemaParseContext::endScan() {
in = nullptr;
}
schema_parser.y 也删除了不必要的标记和案例
%skeleton "lalr1.cc"
%require "3.0.4"
// ---------------------------------------------------------------------------------------------------
// Write a parser header file
%defines
// Define the parser class name
%define parser_class_name {SchemaParser}
// Create the parser in our namespace
%define api.namespace { imlab::schemac }
// Use C++ variant to store the values and get better type warnings (compared to "union")
%define api.value.type variant
// With variant-based values, symbols are handled as a whole in the scanner
%define api.token.constructor
// Prefix all tokens
%define api.token.prefix {SCHEMA_}
// Check if variants are constructed and destroyed properly
%define parse.assert
// Trace the parser
%define parse.trace
// Use verbose parser errors
%define parse.error verbose
// Enable location tracking.
%locations
// Pass the compiler as parameter to yylex/yyparse.
%param { imlab::schemac::SchemaParseContext &sc }
// ---------------------------------------------------------------------------------------------------
// Added to the header file and parser implementation before bison definitions.
// We include string for string tokens and forward declare the SchemaParseContext.
%code requires {
#include <string>
#include <vector>
#include "imlab/schemac/schema_parse_context.h"
}
// ---------------------------------------------------------------------------------------------------
// Import the compiler header in the implementation file
%code {
imlab::schemac::SchemaParser::symbol_type yylex(imlab::schemac::SchemaParseContext& sc);
}
%code {
std::string insertTableId;
int positionToInsert;
}
// ---------------------------------------------------------------------------------------------------
// Token definitions but deleted the most of them
%token <int> INTEGER_VALUE "integer_value"
%token <std::string> IDENTIFIER "identifier"
// ---------------------------------------------------------------------------------------------------
// Define error function
void imlab::schemac::SchemaParser::error(const location_type& l, const std::string& m) {
sc.Error(l.begin.line, l.begin.column, m);
}
现在 flex and bison
为 query
,但几乎是用同样的方式写的。
query_scanner.l
%{
// Header
#include <cerrno>
#include <climits>
#include <cstdlib>
#include <string>
#include <istream>
#include "imlab/queryc/query_parse_context.h"
#include "./query_parser.h"
namespace imlab {
namespace queryc {
// The location of the current token
extern imlab::queryc::location loc;
// The input stream of the scanner
extern std::istream *in;
} // namespace queryc
} // namespace imlab
using namespace imlab::queryc;
// Work around an incompatibility in flex (at least versions
// 2.5.31 through 2.5.33): it generates code that does
// not conform to C89. See Debian bug 333231
// <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=333231>.
#undef yywrap
#define yywrap() 1
// Declare the yylex function
#define YY_DECL QueryParser::symbol_type yylex(QueryParseContext& sc)
// Configure the scanner to use istreams
#define YY_INPUT(buffer, result, max_size) \
result = 0; \
while (true) { \
int c = in->get(); \
if (in->eof()) break; \
buffer[result++] = c; \
if (result == max_size || c == '\n') break; \
}
%}
%{
// ---------------------------------------------------------------------------------------------------
// Options
// ---------------------------------------------------------------------------------------------------
%}
%option noyywrap
%option nounput
%option noinput
%option batch
%option debug
%option caseless
%{
#define YY_USER_ACTION { loc.columns(yyleng); }
%%
%{
loc.step ();
%}
[ \t]+ { loc.step(); }
"\n" { loc.lines (yyleng); loc.step (); }
";" { return QueryParser::make_SEMICOLON(loc); }
%%
// ---------------------------------------------------------------------------------------------------
// Code
// ---------------------------------------------------------------------------------------------------
// The input stream
imlab::queryc::location imlab::queryc::loc;
// The input stream of the scanner
std::istream *imlab::queryc::in = nullptr;
// Begin a scan
void imlab::queryc::QueryParseContext::beginScan(std::istream &is) {
yy_flex_debug = trace_scanning_;
in = &is;
}
// End a scan
void imlab::queryc::QueryParseContext::endScan() {
in = nullptr;
}
query_parser.y
%skeleton "lalr1.cc"
%require "3.0.4"
// ---------------------------------------------------------------------------------------------------
%defines
%define parser_class_name {QueryParser}
%define api.namespace { imlab::queryc }
%define api.value.type variant
%define api.token.constructor
%define api.token.prefix {QUERY_}
%define parse.assert
%define parse.trace
%define parse.error verbose
%locations
%param { imlab::queryc::QueryParseContext &sc }
// ---------------------------------------------------------------------------------------------------
%code requires {
#include <string>
#include <vector>
#include "imlab/queryc/query_parse_context.h"
}
// ---------------------------------------------------------------------------------------------------
%code {
imlab::queryc::QueryParser::symbol_type yylex(imlab::queryc::QueryParseContext& sc);
}
// %code {
// std::string insertTableId;
// int positionToInsert;
// }
// ---------------------------------------------------------------------------------------------------
// Token definitions -- most of them deleted
%token <int> INTEGER_VALUE "integer_value"
%token <std::string> IDENTIFIER "identifier"
// ---------------------------------------------------------------------------------------------------
// Define error function
void imlab::queryc::QueryParser::error(const location_type& l, const std::string& m) {
sc.Error(l.begin.line, l.begin.column, m);
}
// ---------------------------------------------------------------------------------------------------
简而言之:
我可以确定的是:如果我只在 imlabdb.cc
中使用其中一个,这两个 flex and bison
和 imlabdb.cc
之类的其他东西运行良好.我添加到顶部的冲突只会在我尝试同时使用两个解析器时发生。
我想要的: 同时使用两个解析器,避免命名冲突。
谢谢!
(如果我删了太多代码或者当前代码不够用,请post。我会修复它。)
模式 (DDL) 和查询 (DML) 都是 SQL 语言的组成部分。您不需要有两个 lexers/parsers,只需一个基于 SQL 语法定义。例如,查看“YACC SQL Grammar Reference”。
尽管@serge 关于您的具体问题是正确的,但我会更笼统地回答这个问题。
Flex 和 bison 创建一个 C 文件,里面有一堆 public 的函数和变量,例如 yylex
和 yyparse
。这是不幸的,如果你在一个程序中有两个解析器或词法分析器。程序不知道要调用哪个 yylex
和 yyparse
。所以你需要给它们取不同的名字。
要在 flex 中执行此操作,您可以使用选项
%option prefix="foo"
这会将所有具有 yy 前缀的符号重命名为 foo,因此 foolex
。因此,在您的情况下,您然后为一个词法分析器使用一个前缀,为另一个词法分析器使用另一个前缀。
在 bison 上同样如此,只是语法不同。根据您使用的 bision 版本(新):
%define api.prefix {foo}
或(旧):
%name-prefix "foo"
然后将生成 fooparse
。
参考文档: