Node:Parsing subranges, Next:Tree nodes, Previous:Parsing keywords, Up:Internals
Extended Pascal allows arbitrary expressions as the lower bounds of
subrange types. This leads to some following parsing difficulties:
type a = (expr1) .. expr2;
(if expr1
starts with an identifier) vs.
type a = (enum1, enum2);
If the enum type contains at least two items, we get no real conflicts, but what the bison manual calls "mystery conflicts", i.e. our grammer is LR(1), but not LALR(1) which bison requires, Mystery Conflicts.
Our solution is the one suggested in the bison manual, to add a
bogus rule. For that we add a new terminal BOGUS
which is
never used and a new nonterminal conflict_id
which contains
the identifiers that are responsible for the six conflicts.
It gets more difficult if the enum type has only one item, i.e.:
type a = (enum1);
If further expr1
consists of a single identifier, the
conflict cannot be resolved without reading the token following the
right parenthesis. (This is inherent in the EP language.)
But already after reading the identifier (expr1
or
enum1
), our parser has to decide whether to reduce it to an
expression or to keep it as an identifier. (The alternative would be
to define an expression which is anything but a single identifier,
and parse (identifier)
as a distinct thing, but this
would get quite hairy.)
We resolve it with a lexer hack. The lexer turns a right parenthesis
which is followed by ..
into the new token LEX_RPAR
.
Most places in the parser treat LEX_RPAR
and )
as
equivalent (nonterminal rpar
). However, enum types allow only
)
(they can never be followed by ..
), and the lower
bound of a subrange allows only LEX_RPAR
(it is always
followed by ..
). This resolves the conflict.
But there are more conflicts if the lower bound is not enclosed in
parentheses:
type a = Foo (42) .. expr2;
(where Foo
can be one of certain built-in functions such as
Sqr
, or a type name for a type-cast) vs.
type a = Bar (42);
(where Bar
is an undiscriminated schema type).
To resolve this, we let the lexer return a special token
LEX_SCHEMA
for identifiers which correspond to
undiscriminated schema types. The parser accepts this token in
new_identifier
(so schema identifiers can be redefined) and
typename
(e.g. for parameters), but not in id
(which
appears in expressions) where undiscriminated schema types are
invalid.
The last conflict:
type a = @Foo = (expr) .. expr2;
(where @
is the BP address operator - the
= (expr)
is there to create an ordinal (namely,
Boolean) expression that starts with the address operator) vs.
type a = @Bar = (expr);
(where @
is a lexical alternative for ^
, according to
the standards).
The conflict arises already with the @
token. The =
(as a comparison operator in the first case, and for a type
initializer - EP extension, combined with a BP extension of using
=
instead of value
) just adds to the problem. Since
expr can be arbitrary long, the conflict is in fact not
solvable with any fixed number of lookup tokens.
This conflict is decided using parser precedence rules, in favour of the latter interpretation. (BP itself can't parse the supposedly BP compatible former syntax.)