* doc/cppinternals.texi: Update.
From-SVN: r46050
This commit is contained in:
parent
d644be7b4c
commit
5b810d3c83
2 changed files with 65 additions and 47 deletions
|
@ -1,3 +1,7 @@
|
|||
2001-10-06 Neil Booth <neil@daikokuya.demon.co.uk>
|
||||
|
||||
* doc/cppinternals.texi: Update.
|
||||
|
||||
2001-10-06 Zack Weinberg <zack@codesourcery.com>
|
||||
|
||||
* gcc.c (main): Set this_file_error if the appropriate
|
||||
|
|
|
@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions.
|
|||
@titlepage
|
||||
@c @finalout
|
||||
@title Cpplib Internals
|
||||
@subtitle Last revised September 2001
|
||||
@subtitle Last revised October 2001
|
||||
@subtitle for GCC version 3.1
|
||||
@author Neil Booth
|
||||
@page
|
||||
|
@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions.
|
|||
@chapter Cpplib---the core of the GNU C Preprocessor
|
||||
|
||||
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
|
||||
now implemented as a library, cpplib, so it can be easily shared between
|
||||
now implemented as a library, @dfn{cpplib}, so it can be easily shared between
|
||||
a stand-alone preprocessor, and a preprocessor integrated with the C,
|
||||
C++ and Objective-C front ends. It is also available for use by other
|
||||
programs, though this is not recommended as its exposed interface has
|
||||
|
@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who
|
|||
still try to abuse the preprocessor for things like Fortran source and
|
||||
Makefiles.
|
||||
|
||||
For now, just notice that the only places we need to be careful about
|
||||
@dfn{paste avoidance} are when tokens are added (or removed) from the
|
||||
original token stream. This only occurs because of macro expansion, but
|
||||
care is needed in many places: before @strong{and} after each macro
|
||||
replacement, each argument replacement, and additionally each token
|
||||
created by the @samp{#} and @samp{##} operators.
|
||||
For now, just notice that when tokens are added (or removed, as shown by
|
||||
the @code{EMPTY} example) from the original lexed token stream, we need
|
||||
to check for accidental token pasting. We call this @dfn{paste
|
||||
avoidance}. Token addition and removal can only occur because of macro
|
||||
expansion, but accidental pasting can occur in many places: both before
|
||||
and after each macro replacement, each argument replacement, and
|
||||
additionally each token created by the @samp{#} and @samp{##} operators.
|
||||
|
||||
Let's look at how the preprocessor gets whitespace output correct
|
||||
normally. The @code{cpp_token} structure contains a flags byte, and one
|
||||
|
@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other
|
|||
than a new line. The stand-alone preprocessor can use this flag to
|
||||
decide whether to insert a space between tokens in the output.
|
||||
|
||||
Now consider the following:
|
||||
Now consider the result of the following macro expansion:
|
||||
|
||||
@smallexample
|
||||
#define add(x, y, z) x + y +z;
|
||||
|
@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are
|
|||
output with a preceding space, and @samp{3} is output without a
|
||||
preceding space, but when lexed none of these tokens had that property.
|
||||
Careful consideration reveals that @samp{1} gets its preceding
|
||||
whitespace from the space preceding @samp{add} in the macro
|
||||
@emph{invocation}, @samp{2} gets its whitespace from the space preceding
|
||||
the parameter @samp{y} in the macro @emph{replacement list}, and
|
||||
@samp{3} has no preceding space because parameter @samp{z} has none in
|
||||
the replacement list.
|
||||
whitespace from the space preceding @samp{add} in the macro invocation,
|
||||
@emph{not} replacement list. @samp{2} gets its whitespace from the
|
||||
space preceding the parameter @samp{y} in the macro replacement list,
|
||||
and @samp{3} has no preceding space because parameter @samp{z} has none
|
||||
in the replacement list.
|
||||
|
||||
Once lexed, tokens are effectively fixed and cannot be altered, since
|
||||
pointers to them might be held in many places, in particular by
|
||||
in-progress macro expansions. So instead of modifying the two tokens
|
||||
above, the preprocessor inserts a special token, which I call a
|
||||
@dfn{padding token}, into the token stream in front of every macro
|
||||
expansion and expanded macro argument, to indicate that the subsequent
|
||||
token should assume its @code{PREV_WHITE} flag from a different
|
||||
@dfn{source token}. In the above example, the source tokens are
|
||||
@dfn{padding token}, into the token stream to indicate that spacing of
|
||||
the subsequent token is special. The preprocessor inserts padding
|
||||
tokens in front of every macro expansion and expanded macro argument.
|
||||
These point to a @dfn{source token} from which the subsequent real token
|
||||
should inherit its spacing. In the above example, the source tokens are
|
||||
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
|
||||
macro replacement list, respectively.
|
||||
|
||||
|
@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro.
|
|||
@expansion{} [baz]
|
||||
@end smallexample
|
||||
|
||||
Here, two padding tokens with sources @samp{foo} between the brackets,
|
||||
and @samp{bar} from foo's replacement list, are generated. Clearly the
|
||||
first padding token is the one that matters. But what if we happen to
|
||||
leave a macro expansion? Adjusting the above example slightly:
|
||||
Here, two padding tokens are generated with sources the @samp{foo} token
|
||||
between the brackets, and the @samp{bar} token from foo's replacement
|
||||
list, respectively. Clearly the first padding token is the one we
|
||||
should use, so our output code should contain a rule that the first
|
||||
padding token in a sequence is the one that matters.
|
||||
|
||||
But what if we happen to leave a macro expansion? Adjusting the above
|
||||
example slightly:
|
||||
|
||||
@smallexample
|
||||
#define foo bar
|
||||
|
@ -564,33 +570,41 @@ leave a macro expansion? Adjusting the above example slightly:
|
|||
@expansion{} [ baz] ;
|
||||
@end smallexample
|
||||
|
||||
As shown, now there should be a space before baz and the semicolon. Our
|
||||
initial algorithm fails for the former, because we would see three
|
||||
padding tokens, one per macro invocation, followed by @samp{baz}, which
|
||||
would have inherit its spacing from the original source, @samp{foo},
|
||||
which has no leading space. Note that it is vital that cpplib get
|
||||
spacing correct in these examples, since any of these macro expansions
|
||||
could be stringified, where spacing matters.
|
||||
As shown, now there should be a space before @samp{baz} and the
|
||||
semicolon in the output.
|
||||
|
||||
So, I have demonstrated that not just entering macro and argument
|
||||
expansions, but leaving them requires special handling too. So cpplib
|
||||
inserts a padding token with a @code{NULL} source token when leaving
|
||||
macro expansions and after each replaced argument in a macro's
|
||||
replacement list. It also inserts appropriate padding tokens on either
|
||||
side of tokens created by the @samp{#} and @samp{##} operators.
|
||||
The rules we decided above fail for @samp{baz}: we generate three
|
||||
padding tokens, one per macro invocation, before the token @samp{baz}.
|
||||
We would then have it take its spacing from the first of these, which
|
||||
carries source token @samp{foo} with no leading space.
|
||||
|
||||
Now we can see the relationship with paste avoidance: we have to be
|
||||
careful about paste avoidance in exactly the same locations we take care
|
||||
to get white space correct. This makes implementation of paste
|
||||
avoidance easy: wherever the stand-alone preprocessor is fixing up
|
||||
spacing because of padding tokens, and it turns out that no space is
|
||||
needed, it has to take the extra step to check that a space is not
|
||||
needed after all to avoid an accidental paste. The function
|
||||
@code{cpp_avoid_paste} advises whether a space is required between two
|
||||
consecutive tokens. To avoid excessive spacing, it tries hard to only
|
||||
require a space if one is likely to be necessary, but for reasons of
|
||||
efficiency it is slightly conservative and might recommend a space where
|
||||
one is not strictly needed.
|
||||
It is vital that cpplib get spacing correct in these examples since any
|
||||
of these macro expansions could be stringified, where spacing matters.
|
||||
|
||||
So, this demonstrates that not just entering macro and argument
|
||||
expansions, but leaving them requires special handling too. I made
|
||||
cpplib insert a padding token with a @code{NULL} source token when
|
||||
leaving macro expansions, as well as after each replaced argument in a
|
||||
macro's replacement list. It also inserts appropriate padding tokens on
|
||||
either side of tokens created by the @samp{#} and @samp{##} operators.
|
||||
I expanded the rule so that, if we see a padding token with a
|
||||
@code{NULL} source token, @emph{and} that source token has no leading
|
||||
space, then we behave as if we have seen no padding tokens at all. A
|
||||
quick check shows this rule will then get the above example correct as
|
||||
well.
|
||||
|
||||
Now a relationship with paste avoidance is apparent: we have to be
|
||||
careful about paste avoidance in exactly the same locations we have
|
||||
padding tokens in order to get white space correct. This makes
|
||||
implementation of paste avoidance easy: wherever the stand-alone
|
||||
preprocessor is fixing up spacing because of padding tokens, and it
|
||||
turns out that no space is needed, it has to take the extra step to
|
||||
check that a space is not needed after all to avoid an accidental paste.
|
||||
The function @code{cpp_avoid_paste} advises whether a space is required
|
||||
between two consecutive tokens. To avoid excessive spacing, it tries
|
||||
hard to only require a space if one is likely to be necessary, but for
|
||||
reasons of efficiency it is slightly conservative and might recommend a
|
||||
space where one is not strictly needed.
|
||||
|
||||
@node Line Numbering
|
||||
@unnumbered Line numbering
|
||||
|
|
Loading…
Add table
Reference in a new issue