@titlepage
@c @finalout
@title Cpplib Internals
-@subtitle Last revised September 2001
+@subtitle Last revised October 2001
@subtitle for GCC version 3.1
@author Neil Booth
@page
@chapter Cpplib---the core of the GNU C Preprocessor
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
-now implemented as a library, cpplib, so it can be easily shared between
+now implemented as a library, @dfn{cpplib}, so it can be easily shared between
a stand-alone preprocessor, and a preprocessor integrated with the C,
C++ and Objective-C front ends. It is also available for use by other
programs, though this is not recommended as its exposed interface has
still try to abuse the preprocessor for things like Fortran source and
Makefiles.
-For now, just notice that the only places we need to be careful about
-@dfn{paste avoidance} are when tokens are added (or removed) from the
-original token stream. This only occurs because of macro expansion, but
-care is needed in many places: before @strong{and} after each macro
-replacement, each argument replacement, and additionally each token
-created by the @samp{#} and @samp{##} operators.
+For now, just notice that when tokens are added (or removed, as shown by
+the @code{EMPTY} example) from the original lexed token stream, we need
+to check for accidental token pasting. We call this @dfn{paste
+avoidance}. Token addition and removal can only occur because of macro
+expansion, but accidental pasting can occur in many places: both before
+and after each macro replacement, each argument replacement, and
+additionally each token created by the @samp{#} and @samp{##} operators.
Let's look at how the preprocessor gets whitespace output correct
normally. The @code{cpp_token} structure contains a flags byte, and one
than a new line. The stand-alone preprocessor can use this flag to
decide whether to insert a space between tokens in the output.
-Now consider the following:
+Now consider the result of the following macro expansion:
@smallexample
#define add(x, y, z) x + y +z;
output with a preceding space, and @samp{3} is output without a
preceding space, but when lexed none of these tokens had that property.
Careful consideration reveals that @samp{1} gets its preceding
-whitespace from the space preceding @samp{add} in the macro
-@emph{invocation}, @samp{2} gets its whitespace from the space preceding
-the parameter @samp{y} in the macro @emph{replacement list}, and
-@samp{3} has no preceding space because parameter @samp{z} has none in
-the replacement list.
+whitespace from the space preceding @samp{add} in the macro invocation,
+@emph{not} replacement list. @samp{2} gets its whitespace from the
+space preceding the parameter @samp{y} in the macro replacement list,
+and @samp{3} has no preceding space because parameter @samp{z} has none
+in the replacement list.
Once lexed, tokens are effectively fixed and cannot be altered, since
pointers to them might be held in many places, in particular by
in-progress macro expansions. So instead of modifying the two tokens
above, the preprocessor inserts a special token, which I call a
-@dfn{padding token}, into the token stream in front of every macro
-expansion and expanded macro argument, to indicate that the subsequent
-token should assume its @code{PREV_WHITE} flag from a different
-@dfn{source token}. In the above example, the source tokens are
+@dfn{padding token}, into the token stream to indicate that spacing of
+the subsequent token is special. The preprocessor inserts padding
+tokens in front of every macro expansion and expanded macro argument.
+These point to a @dfn{source token} from which the subsequent real token
+should inherit its spacing. In the above example, the source tokens are
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
macro replacement list, respectively.
@expansion{} [baz]
@end smallexample
-Here, two padding tokens with sources @samp{foo} between the brackets,
-and @samp{bar} from foo's replacement list, are generated. Clearly the
-first padding token is the one that matters. But what if we happen to
-leave a macro expansion? Adjusting the above example slightly:
+Here, two padding tokens are generated with sources the @samp{foo} token
+between the brackets, and the @samp{bar} token from foo's replacement
+list, respectively. Clearly the first padding token is the one we
+should use, so our output code should contain a rule that the first
+padding token in a sequence is the one that matters.
+
+But what if we happen to leave a macro expansion? Adjusting the above
+example slightly:
@smallexample
#define foo bar
@expansion{} [ baz] ;
@end smallexample
-As shown, now there should be a space before baz and the semicolon. Our
-initial algorithm fails for the former, because we would see three
-padding tokens, one per macro invocation, followed by @samp{baz}, which
-would have inherit its spacing from the original source, @samp{foo},
-which has no leading space. Note that it is vital that cpplib get
-spacing correct in these examples, since any of these macro expansions
-could be stringified, where spacing matters.
-
-So, I have demonstrated that not just entering macro and argument
-expansions, but leaving them requires special handling too. So cpplib
-inserts a padding token with a @code{NULL} source token when leaving
-macro expansions and after each replaced argument in a macro's
-replacement list. It also inserts appropriate padding tokens on either
-side of tokens created by the @samp{#} and @samp{##} operators.
-
-Now we can see the relationship with paste avoidance: we have to be
-careful about paste avoidance in exactly the same locations we take care
-to get white space correct. This makes implementation of paste
-avoidance easy: wherever the stand-alone preprocessor is fixing up
-spacing because of padding tokens, and it turns out that no space is
-needed, it has to take the extra step to check that a space is not
-needed after all to avoid an accidental paste. The function
-@code{cpp_avoid_paste} advises whether a space is required between two
-consecutive tokens. To avoid excessive spacing, it tries hard to only
-require a space if one is likely to be necessary, but for reasons of
-efficiency it is slightly conservative and might recommend a space where
-one is not strictly needed.
+As shown, now there should be a space before @samp{baz} and the
+semicolon in the output.
+
+The rules we decided above fail for @samp{baz}: we generate three
+padding tokens, one per macro invocation, before the token @samp{baz}.
+We would then have it take its spacing from the first of these, which
+carries source token @samp{foo} with no leading space.
+
+It is vital that cpplib get spacing correct in these examples since any
+of these macro expansions could be stringified, where spacing matters.
+
+So, this demonstrates that not just entering macro and argument
+expansions, but leaving them requires special handling too. I made
+cpplib insert a padding token with a @code{NULL} source token when
+leaving macro expansions, as well as after each replaced argument in a
+macro's replacement list. It also inserts appropriate padding tokens on
+either side of tokens created by the @samp{#} and @samp{##} operators.
+I expanded the rule so that, if we see a padding token with a
+@code{NULL} source token, @emph{and} that source token has no leading
+space, then we behave as if we have seen no padding tokens at all. A
+quick check shows this rule will then get the above example correct as
+well.
+
+Now a relationship with paste avoidance is apparent: we have to be
+careful about paste avoidance in exactly the same locations we have
+padding tokens in order to get white space correct. This makes
+implementation of paste avoidance easy: wherever the stand-alone
+preprocessor is fixing up spacing because of padding tokens, and it
+turns out that no space is needed, it has to take the extra step to
+check that a space is not needed after all to avoid an accidental paste.
+The function @code{cpp_avoid_paste} advises whether a space is required
+between two consecutive tokens. To avoid excessive spacing, it tries
+hard to only require a space if one is likely to be necessary, but for
+reasons of efficiency it is slightly conservative and might recommend a
+space where one is not strictly needed.
@node Line Numbering
@unnumbered Line numbering