a semantic value (the value of an integer,
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
-* GLR Parsers:: Writing parsers for general context-free languages
+* GLR Parsers:: Writing parsers for general context-free languages.
+* Simple GLR Parsers:: Using GLR in its simplest form.
* Locations Overview:: Tracking Locations.
* Bison Parser:: What are Bison's input and output,
how is the output used?
a semantic value (the value of an integer,
the name of an identifier, etc.).
* Semantic Actions:: Each rule can have an action containing C code.
-* GLR Parsers:: Writing parsers for general context-free languages
+* GLR Parsers:: Writing parsers for general context-free languages.
+* Simple GLR Parsers:: Using GLR in its simplest form.
* Locations Overview:: Tracking Locations.
* Bison Parser:: What are Bison's input and output,
how is the output used?
%@}
@end example
+@node Simple GLR Parsers
+@section Using @acronym{GLR} in its Simplest Form
+@cindex @acronym{GLR} parsing, unambiguous grammars
+@cindex generalized @acronym{LR} (@acronym{GLR}) parsing, unambiguous grammars
+@findex %glr-parser
+@findex %expect-rr
+@cindex conflicts
+@cindex reduce/reduce conflicts
+
+The C++ example for @acronym{GLR} (@pxref{GLR Parsers}) explains how to use
+the @acronym{GLR} parsing algorithm with some advanced features such as
+@samp{%dprec} and @samp{%merge} to handle syntactically ambiguous
+grammars. However, the @acronym{GLR} algorithm can also be used in a simpler
+way to parse grammars that are unambiguous, but fail to be @acronym{LALR}(1).
+Such grammars typically require more than one symbol of lookahead,
+or (in rare cases) fall into the category of grammars in which the
+@acronym{LALR}(1) algorithm throws away too much information (they are in
+@acronym{LR}(1), but not @acronym{LALR}(1), @ref{Mystery Conflicts}).
+
+Here is an example of this situation, using a problem that
+arises in the declaration of enumerated and subrange types in the
+programming language Pascal. These declarations look like this:
+
+@example
+type subrange = lo .. hi;
+type enum = (a, b, c);
+@end example
+
+@noindent
+The original language standard allows only numeric
+literals and constant identifiers for the subrange bounds (@samp{lo}
+and @samp{hi}), but Extended Pascal (ISO/IEC 10206:1990) and many other
+Pascal implementations allow arbitrary expressions there. This gives
+rise to the following situation, containing a superfluous pair of
+parentheses:
+
+@example
+type subrange = (a) .. b;
+@end example
+
+@noindent
+Compare this to the following declaration of an enumerated
+type with only one value:
+
+@example
+type enum = (a);
+@end example
+
+@noindent
+(These declarations are contrived, but they are syntactically
+valid, and more-complicated cases can come up in practical programs.)
+
+These two declarations look identical until the @samp{..} token.
+With normal @acronym{LALR}(1) one-token look-ahead it is not
+possible to decide between the two forms when the identifier
+@samp{a} is parsed. It is, however, desirable
+for a parser to decide this, since in the latter case
+@samp{a} must become a new identifier to represent the enumeration
+value, while in the former case @samp{a} must be evaluated with its
+current meaning, which may be a constant or even a function call.
+
+You could parse @samp{(a)} as an ``unspecified identifier in parentheses'',
+to be resolved later, but this typically requires substantial
+contortions in both semantic actions and large parts of the
+grammar, where the parentheses are nested in the recursive rules for
+expressions.
+
+You might think of using the lexer to distinguish between the two
+forms by returning different tokens for currently defined and
+undefined identifiers. But if these declarations occur in a local
+scope, and @samp{a} is defined in an outer scope, then both forms
+are possible---either locally redefining @samp{a}, or using the
+value of @samp{a} from the outer scope. So this approach cannot
+work.
+
+A solution to this problem is to use a @acronym{GLR} parser in its simplest
+form, i.e., without using special features such as @samp{%dprec} and
+@samp{%merge}. When the @acronym{GLR} parser reaches the critical state, it
+simply splits into two branches and pursues both syntax rules
+simultaneously. Sooner or later, one of them runs into a parsing
+error. If there is a @samp{..} token before the next
+@samp{;}, the rule for enumerated types fails since it cannot
+accept @samp{..} anywhere; otherwise, the subrange type rule
+fails since it requires a @samp{..} token. So one of the branches
+fails silently, and the other one continues normally, performing
+all the intermediate actions that were postponed during the split.
+
+If the input is syntactically incorrect, both branches fail and the parser
+reports a syntax error as usual.
+
+The effect of all this is that the parser seems to ``guess'' the
+correct branch to take, or in other words, it seems to use more
+look-ahead than the underlying @acronym{LALR}(1) algorithm actually allows
+for. In this example, @acronym{LALR}(2) would suffice, but also some cases
+that are not @acronym{LALR}(@math{k}) for any @math{k} can be handled this way.
+
+Since there can be only two branches and at least one of them
+must fail, you need not worry about merging the branches by
+using dynamic precedence or @samp{%merge}.
+
+Another potential problem of @acronym{GLR} does not arise here, either. In
+general, a @acronym{GLR} parser can take quadratic or cubic worst-case time,
+and the current Bison parser even takes exponential time and space
+for some grammars. In practice, this rarely happens, and for many
+grammars it is possible to prove that it cannot happen. In
+in the present example, there is only one conflict between two
+rules, and the type-declaration context where the conflict
+arises cannot be nested. So the number of
+branches that can exist at any time is limited by the constant 2,
+and the parsing time is still linear.
+
+So here we have a case where we can use the benefits of @acronym{GLR}, almost
+without disadvantages. There are two things to note, though.
+First, one should carefully analyze the conflicts reported by
+Bison to make sure that @acronym{GLR} splitting is done only where it is
+intended to be. A @acronym{GLR} parser splitting inadvertently may cause
+problems less obvious than an @acronym{LALR} parser statically choosing the
+wrong alternative in a conflict.
+
+Second, interactions with the lexer (@pxref{Semantic Tokens}) must
+be considered with great care. Since a split parser consumes tokens
+without performing any actions during the split, the lexer cannot
+obtain information via parser actions. Some cases of
+lexer interactions can simply be eliminated by using @acronym{GLR}, i.e.,
+shifting the complications from the lexer to the parser. Remaining
+cases have to be checked for safety.
+
+In our example, it would be safe for the lexer to return tokens
+based on their current meanings in some symbol table, because no new
+symbols are defined in the middle of a type declaration. Though it
+is possible for a parser to define the enumeration
+constants as they are parsed, before the type declaration is
+completed, it actually makes no difference since they cannot be used
+within the same enumerated type declaration.
+
+Here is a Bison grammar corresponding to the example above. It
+parses a vastly simplified form of Pascal type declarations.
+
+@example
+%token TYPE DOTDOT ID
+
+@group
+%left '+' '-'
+%left '*' '/'
+@end group
+
+%%
+
+@group
+type_decl:
+ TYPE ID '=' type ';'
+;
+@end group
+
+@group
+type: '(' id_list ')'
+ | expr DOTDOT expr
+;
+@end group
+
+@group
+id_list: ID
+ | id_list ',' ID
+;
+@end group
+
+@group
+expr: '(' expr ')'
+ | expr '+' expr
+ | expr '-' expr
+ | expr '*' expr
+ | expr '/' expr
+ | ID
+;
+@end group
+@end example
+
+When used as a normal @acronym{LALR}(1) grammar, Bison correctly complains
+about one reduce/reduce conflict. In the conflicting situation the
+parser chooses one of the alternatives, arbitrarily the one
+declared first. Therefore the following correct input is not
+recognized:
+
+@example
+type t = (a) .. b;
+@end example
+
+The parser can be turned into a @acronym{GLR} parser, while also telling Bison
+to be silent about the one known reduce/reduce conflict, simply by
+adding these two declarations to the Bison input file:
+
+@example
+%glr-parser
+%expect-rr 1
+@end example
+
+@noindent
+No change in the grammar itself is required. Now the
+parser recognizes all valid declarations, according to the
+limited syntax above, transparently. In fact, the user does not even
+notice when the parser splits.
+
@node Locations Overview
@section Locations
@cindex location
For example, this:
@example
-exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{}
+exp : NUM | exp exp '+' @{$$ = $1 + $2; @} | @dots{} ;
@end example
@noindent
exp: NUM
| exp exp '+' @{ $$ = $1 + $2; @}
| @dots{}
+;
@end example
@noindent