Node:Language definition, Next:Parsing keywords, Previous:Lexer/parser, Up:Internals
The file parse.y
contains the "bison" source code of GNU
Pascal's parser. This stage of the compilation analyzes and checks
the syntax of your Pascal program, and it generates an intermediate,
language-independent code which is then passed to the GNU back-end.
The bison language essentially is a machine-readable form of the Backus-Naur Form, the symbolic notation of grammars of computer languages. "Syntax diagrams" are a graphical variant of the Backus-Naur Form.
For details about the "bison" language, see the Bison manual. A short overview how to pick up some information you might need for programming follows.
Suppose you have forgotten how a variable is declared in Pascal.
After some searching in parse.y
you have found the following:
/* variable declaration part */ variable_declaration_part: LEX_VAR variable_declaration_list semi | LEX_VAR semi { error ("missing variable declaration"); } ; variable_declaration_list: variable_declaration | variable_declaration_list semi variable_declaration { yyerrok; } | error | variable_declaration_list error variable_declaration { error ("missing semicolon"); yyerrok; } | variable_declaration_list semi error ;
Translated into English, this means: "The variable declaration part
consists of the reserved word (lexical token) var
followed by
a `variable declaration list' and a semicolon. A semicolon
immediately following var
is an error. A `variable
declaration list' in turn consists of one or more `variable
declarations', separated by semicolons." (The latter explanation
requires that you understand the recursive nature of the definition
of variable_declaration_list
.)
Now we can go on and search for variable_declaration
.
variable_declaration: id_list { [...] } enable_caret ':' optional_qualifier_list type_denoter { [...] } absolute_or_value_specification { [...] } ;
(The [...]
are placeholders for some C statements which aren't
important for understanding GPC's grammar.)
From this you can look up that a variable declaration in GNU Pascal
consists of an "id list", followed by "enable_caret" (whatever
that means), a colon, an "optional qualifier list", a "type
denoter", and an "absolute or value specification". Some of these
parts are easy to understand, the others you can look up from
parse.y
. Remember that the reserved word var
precedes
all this, and a semicolon follows all this.
Now you know the procedure how to get the exact grammar of the GNU Pascal language from the source.
The C statements, not shown above, are in some sense the most
important part of the bison source, because they are responsible for
the generation of the intermediate code of the GNU Pascal front-end,
the so-called tree nodes (which are used to represent most
things in the compiler). For instance, the C code in "type
denoter" returns (assigns to $$
) information about the type
in a variable of type tree
.
The "variable declaration" gets this and other information in the
numbered arguments ($1
etc.) and passes it to some C
functions declared in the other source files. Generally, those
functions do the real work, while the main job of the C statements
in the parser is to call them and pass their arguments around.
This, the parser, is the place where it becomes Pascal.