* doc/cppinternals.texi: Update.

From-SVN: r46050
2001-10-06 11:29:51 +00:00 · 2001-10-06 11:29:51 +00:00 · 5b810d3c83
commit 5b810d3c83
parent d644be7b4c
2 changed files with 65 additions and 47 deletions
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@ -1,3 +1,7 @@
+2001-10-06  Neil Booth  <neil@daikokuya.demon.co.uk>
+
+	* doc/cppinternals.texi: Update.
+
 2001-10-06  Zack Weinberg  <zack@codesourcery.com>

 	* gcc.c (main): Set this_file_error if the appropriate
--- a/gcc/doc/cppinternals.texi
+++ b/gcc/doc/cppinternals.texi
@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions.
@titlepage
@c @finalout
@title Cpplib Internals
-@subtitle Last revised September 2001
+@subtitle Last revised October 2001
@subtitle for GCC version 3.1
@author Neil Booth
@page
@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions.
@chapter Cpplib---the core of the GNU C Preprocessor

 The GNU C preprocessor in GCC 3.x has been completely rewritten.  It is
-now implemented as a library, cpplib, so it can be easily shared between
+now implemented as a library, @dfn{cpplib}, so it can be easily shared between
 a stand-alone preprocessor, and a preprocessor integrated with the C,
 C++ and Objective-C front ends.  It is also available for use by other
 programs, though this is not recommended as its exposed interface has
@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who
 still try to abuse the preprocessor for things like Fortran source and
 Makefiles.

-For now, just notice that the only places we need to be careful about
-@dfn{paste avoidance} are when tokens are added (or removed) from the
-original token stream.  This only occurs because of macro expansion, but
-care is needed in many places: before @strong{and} after each macro
-replacement, each argument replacement, and additionally each token
-created by the @samp{#} and @samp{##} operators.
+For now, just notice that when tokens are added (or removed, as shown by
+the @code{EMPTY} example) from the original lexed token stream, we need
+to check for accidental token pasting.  We call this @dfn{paste
+avoidance}.  Token addition and removal can only occur because of macro
+expansion, but accidental pasting can occur in many places: both before
+and after each macro replacement, each argument replacement, and
+additionally each token created by the @samp{#} and @samp{##} operators.

 Let's look at how the preprocessor gets whitespace output correct
 normally.  The @code{cpp_token} structure contains a flags byte, and one
@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other
 than a new line.  The stand-alone preprocessor can use this flag to
 decide whether to insert a space between tokens in the output.

-Now consider the following:
+Now consider the result of the following macro expansion:

@smallexample
 #define add(x, y, z) x + y +z;
@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are
 output with a preceding space, and @samp{3} is output without a
 preceding space, but when lexed none of these tokens had that property.
 Careful consideration reveals that @samp{1} gets its preceding
-whitespace from the space preceding @samp{add} in the macro
-@emph{invocation}, @samp{2} gets its whitespace from the space preceding
-the parameter @samp{y} in the macro @emph{replacement list}, and
-@samp{3} has no preceding space because parameter @samp{z} has none in
-the replacement list.
+whitespace from the space preceding @samp{add} in the macro invocation,
+@emph{not} replacement list.  @samp{2} gets its whitespace from the
+space preceding the parameter @samp{y} in the macro replacement list,
+and @samp{3} has no preceding space because parameter @samp{z} has none
+in the replacement list.

 Once lexed, tokens are effectively fixed and cannot be altered, since
 pointers to them might be held in many places, in particular by
 in-progress macro expansions.  So instead of modifying the two tokens
 above, the preprocessor inserts a special token, which I call a
-@dfn{padding token}, into the token stream in front of every macro
-expansion and expanded macro argument, to indicate that the subsequent
-token should assume its @code{PREV_WHITE} flag from a different
-@dfn{source token}.  In the above example, the source tokens are
+@dfn{padding token}, into the token stream to indicate that spacing of
+the subsequent token is special.  The preprocessor inserts padding
+tokens in front of every macro expansion and expanded macro argument.
+These point to a @dfn{source token} from which the subsequent real token
+should inherit its spacing.  In the above example, the source tokens are
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
 macro replacement list, respectively.

@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro.
        @expansion{} [baz]
@end smallexample

-Here, two padding tokens with sources @samp{foo} between the brackets,
-and @samp{bar} from foo's replacement list, are generated.  Clearly the
-first padding token is the one that matters.  But what if we happen to
-leave a macro expansion?  Adjusting the above example slightly:
+Here, two padding tokens are generated with sources the @samp{foo} token
+between the brackets, and the @samp{bar} token from foo's replacement
+list, respectively.  Clearly the first padding token is the one we
+should use, so our output code should contain a rule that the first
+padding token in a sequence is the one that matters.
+
+But what if we happen to leave a macro expansion?  Adjusting the above
+example slightly:

@smallexample
 #define foo bar
@ -564,33 +570,41 @@ leave a macro expansion?  Adjusting the above example slightly:
        @expansion{} [ baz] ;
@end smallexample

-As shown, now there should be a space before baz and the semicolon.  Our
-initial algorithm fails for the former, because we would see three
-padding tokens, one per macro invocation, followed by @samp{baz}, which
-would have inherit its spacing from the original source, @samp{foo},
-which has no leading space.  Note that it is vital that cpplib get
-spacing correct in these examples, since any of these macro expansions
-could be stringified, where spacing matters.
+As shown, now there should be a space before @samp{baz} and the
+semicolon in the output.

-So, I have demonstrated that not just entering macro and argument
-expansions, but leaving them requires special handling too.  So cpplib
-inserts a padding token with a @code{NULL} source token when leaving
-macro expansions and after each replaced argument in a macro's
-replacement list.  It also inserts appropriate padding tokens on either
-side of tokens created by the @samp{#} and @samp{##} operators.
+The rules we decided above fail for @samp{baz}: we generate three
+padding tokens, one per macro invocation, before the token @samp{baz}.
+We would then have it take its spacing from the first of these, which
+carries source token @samp{foo} with no leading space.

-Now we can see the relationship with paste avoidance: we have to be
-careful about paste avoidance in exactly the same locations we take care
-to get white space correct.  This makes implementation of paste
-avoidance easy: wherever the stand-alone preprocessor is fixing up
-spacing because of padding tokens, and it turns out that no space is
-needed, it has to take the extra step to check that a space is not
-needed after all to avoid an accidental paste.  The function
-@code{cpp_avoid_paste} advises whether a space is required between two
-consecutive tokens.  To avoid excessive spacing, it tries hard to only
-require a space if one is likely to be necessary, but for reasons of
-efficiency it is slightly conservative and might recommend a space where
-one is not strictly needed.
+It is vital that cpplib get spacing correct in these examples since any
+of these macro expansions could be stringified, where spacing matters.
+
+So, this demonstrates that not just entering macro and argument
+expansions, but leaving them requires special handling too.  I made
+cpplib insert a padding token with a @code{NULL} source token when
+leaving macro expansions, as well as after each replaced argument in a
+macro's replacement list.  It also inserts appropriate padding tokens on
+either side of tokens created by the @samp{#} and @samp{##} operators.
+I expanded the rule so that, if we see a padding token with a
+@code{NULL} source token, @emph{and} that source token has no leading
+space, then we behave as if we have seen no padding tokens at all.  A
+quick check shows this rule will then get the above example correct as
+well.
+
+Now a relationship with paste avoidance is apparent: we have to be
+careful about paste avoidance in exactly the same locations we have
+padding tokens in order to get white space correct.  This makes
+implementation of paste avoidance easy: wherever the stand-alone
+preprocessor is fixing up spacing because of padding tokens, and it
+turns out that no space is needed, it has to take the extra step to
+check that a space is not needed after all to avoid an accidental paste.
+The function @code{cpp_avoid_paste} advises whether a space is required
+between two consecutive tokens.  To avoid excessive spacing, it tries
+hard to only require a space if one is likely to be necessary, but for
+reasons of efficiency it is slightly conservative and might recommend a
+space where one is not strictly needed.

@node Line Numbering
@unnumbered Line numbering