[PATCH] Whitespace issues during preprocessing

Wed Jan 29 12:04:11 PST 2014

On 28/01/14 07:24, Justin Bogner wrote:
> Hi Harald,
> 
> Harald van Dijk <harald at gigawatt.nl> writes:
>> Quite a while ago, I noticed that clang had some preprocessor issues
>> with whitespace when macro arguments were empty, and reported that at
>> <http://llvm.org/bugs/show_bug.cgi?id=15675>. I've now taken a closer
>> look, and found a few more problems. Attached are three patches that
>> should address these problems, and one patch that has no functional
>> effect but makes the code a bit easier to follow. They include new tests
>> and tweak one existing test. All tests pass, and to make sure I am not
>> misinterpreting the standard, I've verified that all new and changed
>> tests also pass with GCC.
>>
>> Any thoughts or comments? Can these patches be committed?
> 
> This looks pretty reasonable. I have a few minor comments inline.

Hi Justin,

Thanks for looking.

>> Fix whitespace handling in ## operator
>>
>> In x ## y, where x and y are regular tokens, whitespace between x and ##
>> is ignored, and whitespace between ## and y is also ignored. When either
>> x or y is a function argument, whitespace was preserved, but it should
>> not be. This patch removes the checks for whitespace before ## and
>> before y, and in the special case where x is an empty macro argument and
>> y is a regular token, actively removes whitespace before y.
>>
>> One existing test is affected by that change, but as clang's output now
>> matches the standard's requirements and that of GCC, I've tweaked the
>> testcase.
>> ---
>>  lib/Lex/TokenLexer.cpp                   | 27 +++++++++++++++------------
>>  test/Preprocessor/macro_paste_commaext.c |  4 ++--
>>  test/Preprocessor/macro_paste_spacing.c  | 16 +++++++++++++++-
>>  3 files changed, 32 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/Lex/TokenLexer.cpp b/lib/Lex/TokenLexer.cpp
>> index 829c1bd..485db6c 100644
>> --- a/lib/Lex/TokenLexer.cpp
>> +++ b/lib/Lex/TokenLexer.cpp
>> @@ -223,6 +223,13 @@ void TokenLexer::ExpandFunctionArguments() {
>>        continue;
>>      }
>>  
>> +    // Find out if there is a paste (##) operator before or after the token.
>> +    bool NonEmptyPasteBefore =
>> +      !ResultToks.empty() && ResultToks.back().is(tok::hashhash);
>> +    bool PasteBefore = i != 0 && Tokens[i-1].is(tok::hashhash);
>> +    bool PasteAfter = i+1 != e && Tokens[i+1].is(tok::hashhash);
>> +    assert(!NonEmptyPasteBefore || PasteBefore);
>> +
>>      // Otherwise, if this is not an argument token, just add the token to the
>>      // output buffer.
>>      IdentifierInfo *II = CurTok.getIdentifierInfo();
>> @@ -234,6 +241,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>        if (NextTokGetsSpace) {
>>          ResultToks.back().setFlag(Token::LeadingSpace);
>>          NextTokGetsSpace = false;
>> +      } else if (PasteBefore && !NonEmptyPasteBefore) {
>> +        ResultToks.back().clearFlag(Token::LeadingSpace);
>>        }
> 
> The braces aren't necessary for the else if block here.

Right. I've seen if-statements in clang, even in this specific file,
both with and without braces where a single statement is used. I have no
preference in this case; if you prefer not having the braces, I will
remove them.

>>        continue;
>>      }
>> @@ -242,13 +251,7 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // input.
>>      MadeChange = true;
>>  
>> -    // Otherwise, this is a use of the argument.  Find out if there is a paste
>> -    // (##) operator before or after the argument.
>> -    bool NonEmptyPasteBefore =
>> -      !ResultToks.empty() && ResultToks.back().is(tok::hashhash);
>> -    bool PasteBefore = i != 0 && Tokens[i-1].is(tok::hashhash);
>> -    bool PasteAfter = i+1 != e && Tokens[i+1].is(tok::hashhash);
>> -    assert(!NonEmptyPasteBefore || PasteBefore);
>> +    // Otherwise, this is a use of the argument.
>>  
>>      // In Microsoft mode, remove the comma before __VA_ARGS__ to ensure there
>>      // are no trailing commas if __VA_ARGS__ is empty.
>> @@ -358,8 +361,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>        // assembler-with-cpp mode, invalid pastes are allowed through: in this
>>        // case, we do not want the extra whitespace to be added.  For example,
>>        // we want ". ## foo" -> ".foo" not ". foo".
>> -      if ((CurTok.hasLeadingSpace() || NextTokGetsSpace) &&
>> -          !NonEmptyPasteBefore)
>> +      if ((CurTok.hasLeadingSpace() && !PasteBefore) ||
>> +          (NextTokGetsSpace && !NonEmptyPasteBefore))
>>          ResultToks[ResultToks.size()-NumToks].setFlag(Token::LeadingSpace);
> 
> Does the comment need to be updated here?

As far as I can tell, my changes do not require an update of the comment
here, but the comment is out of date and . ## foo becomes . foo in
assembler-with-cpp mode even without my changes. (That is what I am
getting with clang with my changes, and clang 3.3, anyway.) It should be
easy to fix, and if so I will do so and add a test for it.

>>        NextTokGetsSpace = false;
>> @@ -370,11 +373,11 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // 6.10.3.3p2,3) calls for a bunch of placemarker stuff to occur.  We
>>      // implement this by eating ## operators when a LHS or RHS expands to
>>      // empty.
>> -    NextTokGetsSpace |= CurTok.hasLeadingSpace();
>> +    if (!PasteBefore)
>> +      NextTokGetsSpace |= i != 0 && CurTok.hasLeadingSpace();
>>      if (PasteAfter) {
>>        // Discard the argument token and skip (don't copy to the expansion
>>        // buffer) the paste operator after it.
>> -      NextTokGetsSpace |= Tokens[i+1].hasLeadingSpace();
>>        ++i;
>>        continue;
>>      }
>> @@ -385,7 +388,7 @@ void TokenLexer::ExpandFunctionArguments() {
>>      assert(PasteBefore);
>>      if (NonEmptyPasteBefore) {
>>        assert(ResultToks.back().is(tok::hashhash));
>> -      NextTokGetsSpace |= ResultToks.pop_back_val().hasLeadingSpace();
>> +      ResultToks.pop_back();
>>      }
>>  
>>      // If this is the __VA_ARGS__ token, and if the argument wasn't provided,
>> diff --git a/test/Preprocessor/macro_paste_commaext.c b/test/Preprocessor/macro_paste_commaext.c
>> index 7cfe43d..fdb8f98 100644
>> --- a/test/Preprocessor/macro_paste_commaext.c
>> +++ b/test/Preprocessor/macro_paste_commaext.c
>> @@ -1,8 +1,8 @@
>>  // RUN: %clang_cc1 %s -E | grep 'V);'
>>  // RUN: %clang_cc1 %s -E | grep 'W, 1, 2);'
>>  // RUN: %clang_cc1 %s -E | grep 'X, 1, 2);'
>> -// RUN: %clang_cc1 %s -E | grep 'Y, );'
>> -// RUN: %clang_cc1 %s -E | grep 'Z, );'
>> +// RUN: %clang_cc1 %s -E | grep 'Y,);'
>> +// RUN: %clang_cc1 %s -E | grep 'Z,);'
>>  
>>  #define debug(format, ...) format, ## __VA_ARGS__)
>>  debug(V);
>> diff --git a/test/Preprocessor/macro_paste_spacing.c b/test/Preprocessor/macro_paste_spacing.c
>> index 6498ffc..4ce8e9d 100644
>> --- a/test/Preprocessor/macro_paste_spacing.c
>> +++ b/test/Preprocessor/macro_paste_spacing.c
>> @@ -1,7 +1,21 @@
>> -// RUN: %clang_cc1 %s -E | grep "^xy$"
>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>  
>>  #define A  x ## y
>>  blah
>>  
>>  A
>> +// CHECK: {{^}}xy{{$}}
>>  
>> +#define B(x, y) [v ## w] [ v##w] [w ## x] [ w##x] [x ## y] [ x##y] [y ## z] [ y##z]
>> +B(x,y)
>> +// CHECK: [vw] [ vw] [wx] [ wx] [xy] [ xy] [yz] [ yz]
>> +B(x,)
>> +// CHECK: [vw] [ vw] [wx] [ wx] [x] [ x] [z] [ z]
>> +B(,y)
>> +// CHECK: [vw] [ vw] [w] [ w] [y] [ y] [yz] [ yz]
>> +B(,)
>> +// CHECK: [vw] [ vw] [w] [ w] [] [ ] [z] [ z]
> 
> These all deal with leading space, but what about trailing space? Should
> there be a [v##w ] test?

White space is treated as a property of the following token in clang. In
[x##y], the tokens that are part of the concatenation are x, ##, and y,
so I have included the relevant checks to see that a space before x, a
space before ##, and a space before y all get handled correctly. Neither
the [ nor the ] token are part of the concatenation, which is why I have
not included those in the test, but if you prefer I will be happy to add
them anyway. It could be useful to verify that there is no code that
unintentionally removes the space, especially considering I specifically
added code that does intentionally remove spacing elsewhere in this test.

>> +
>> +#define C(x, y, z) [x ## y ## z]
>> +C(,,) C(a,,) C(,b,) C(,,c) C(a,b,) C(a,,c) C(,b,c) C(a,b,c)
>> +// CHECK: [] [a] [b] [c] [ab] [ac] [bc] [abc]
>> -- 
>> 1.8.5.2
>>
> 
>> Fix whitespace handling in empty macro arguments
>>
>> When a function-like macro definition ends with one of the macro's
>> parameters, and the argument is empty, any whitespace before the
>> parameter name in the macro definition needs to be preserved. Promoting
>> the existing NextTokGetsSpace to a preserved bit-field and checking it
>> at the end of the macro expansion allows it to be moved to the first
>> token following the macro expansion result.
>> ---
>>  include/clang/Lex/TokenLexer.h      |  8 ++++++++
>>  lib/Lex/TokenLexer.cpp              | 38 +++++++++++++++++++------------------
>>  test/Preprocessor/macro_arg_empty.c |  7 +++++++
>>  3 files changed, 35 insertions(+), 18 deletions(-)
>>  create mode 100644 test/Preprocessor/macro_arg_empty.c
>>
>> diff --git a/include/clang/Lex/TokenLexer.h b/include/clang/Lex/TokenLexer.h
>> index 7c8cfd0..247c07b 100644
>> --- a/include/clang/Lex/TokenLexer.h
>> +++ b/include/clang/Lex/TokenLexer.h
>> @@ -81,6 +81,14 @@ class TokenLexer {
>>    bool AtStartOfLine : 1;
>>    bool HasLeadingSpace : 1;
>>  
>> +  // NextTokGetsSpace - When this is true, the next token appended to the
>> +  // output list during function argument expansion will get a leading space,
>> +  // regardless of whether it had one to begin with or not. This is used for
>> +  // placemarker support. If still true after function argument expansion, the
>> +  // leading space will be applied to the first token following the macro
>> +  // expansion.
>> +  bool NextTokGetsSpace : 1;
>> +
>>    /// OwnsTokens - This is true if this TokenLexer allocated the Tokens
>>    /// array, and thus needs to free it when destroyed.  For simple object-like
>>    /// macros (for example) we just point into the token buffer of the macro
>> diff --git a/lib/Lex/TokenLexer.cpp b/lib/Lex/TokenLexer.cpp
>> index 485db6c..921434b 100644
>> --- a/lib/Lex/TokenLexer.cpp
>> +++ b/lib/Lex/TokenLexer.cpp
>> @@ -37,6 +37,7 @@ void TokenLexer::Init(Token &Tok, SourceLocation ELEnd, MacroInfo *MI,
>>    ExpandLocEnd = ELEnd;
>>    AtStartOfLine = Tok.isAtStartOfLine();
>>    HasLeadingSpace = Tok.hasLeadingSpace();
>> +  NextTokGetsSpace = false;
>>    Tokens = &*Macro->tokens_begin();
>>    OwnsTokens = false;
>>    DisableMacroExpansion = false;
>> @@ -95,6 +96,7 @@ void TokenLexer::Init(const Token *TokArray, unsigned NumToks,
>>    ExpandLocStart = ExpandLocEnd = SourceLocation();
>>    AtStartOfLine = false;
>>    HasLeadingSpace = false;
>> +  NextTokGetsSpace = false;
>>    MacroExpansionStart = SourceLocation();
>>  
>>    // Set HasLeadingSpace/AtStartOfLine so that the first token will be
>> @@ -122,7 +124,6 @@ void TokenLexer::destroy() {
>>  /// Remove comma ahead of __VA_ARGS__, if present, according to compiler dialect
>>  /// settings.  Returns true if the comma is removed.
>>  static bool MaybeRemoveCommaBeforeVaArgs(SmallVectorImpl<Token> &ResultToks,
>> -                                         bool &NextTokGetsSpace,
>>                                           bool HasPasteOperator,
>>                                           MacroInfo *Macro, unsigned MacroArgNo,
>>                                           Preprocessor &PP) {
>> @@ -163,8 +164,6 @@ static bool MaybeRemoveCommaBeforeVaArgs(SmallVectorImpl<Token> &ResultToks,
>>    if (!ResultToks.empty() && ResultToks.back().is(tok::hashhash))
>>      ResultToks.pop_back();
>>  
>> -  // Never add a space, even if the comma, ##, or arg had a space.
>> -  NextTokGetsSpace = false;
>>    return true;
>>  }
> 
> Should MaybeRemoveCommaBeforeVaArgs be made a private method of
> TokenLexer rather tahn static? This way you wouldn't need to change this
> assignment and duplicate the logic for all of the callers.

Agreed, that would be simpler. I will change this.

>> @@ -179,11 +178,6 @@ void TokenLexer::ExpandFunctionArguments() {
>>    // we install the newly expanded sequence as the new 'Tokens' list.
>>    bool MadeChange = false;
>>  
>> -  // NextTokGetsSpace - When this is true, the next token appended to the
>> -  // output list will get a leading space, regardless of whether it had one to
>> -  // begin with or not.  This is used for placemarker support.
>> -  bool NextTokGetsSpace = false;
>> -
>>    for (unsigned i = 0, e = NumTokens; i != e; ++i) {
>>      // If we found the stringify operator, get the argument stringified.  The
>>      // preprocessor already verified that the following token is a macro name
>> @@ -256,10 +250,13 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // In Microsoft mode, remove the comma before __VA_ARGS__ to ensure there
>>      // are no trailing commas if __VA_ARGS__ is empty.
>>      if (!PasteBefore && ActualArgs->isVarargsElidedUse() &&
>> -        MaybeRemoveCommaBeforeVaArgs(ResultToks, NextTokGetsSpace,
>> +        MaybeRemoveCommaBeforeVaArgs(ResultToks,
>>                                       /*HasPasteOperator=*/false,
>> -                                     Macro, ArgNo, PP))
>> +                                     Macro, ArgNo, PP)) {
>> +      // Never add a space, even if the comma or arg had a space.
>> +      NextTokGetsSpace = false;
>>        continue;
>> +    }
> 
> If you agree with the above, this change isn't needed (other than the
> argument update).

Indeed.

>>      // If it is not the LHS/RHS of a ## operator, we must pre-expand the
>>      // argument and substitute the expanded tokens into the result.  This is
>> @@ -311,9 +308,10 @@ void TokenLexer::ExpandFunctionArguments() {
>>                                               NextTokGetsSpace);
>>          NextTokGetsSpace = false;
>>        } else {
>> -        // If this is an empty argument, and if there was whitespace before the
>> -        // formal token, make sure the next token gets whitespace before it.
>> -        NextTokGetsSpace = CurTok.hasLeadingSpace();
>> +        // If this is an empty argument, if there was whitespace before the
>> +        // formal token, and this is not the first token in the macro
>> +        // definition, make sure the next token gets whitespace before it.
>> +        NextTokGetsSpace |= i != 0 && CurTok.hasLeadingSpace();
>>        }
>>        continue;
>>      }
>> @@ -395,10 +393,14 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // and if the macro had at least one real argument, and if the token before
>>      // the ## was a comma, remove the comma.  This is a GCC extension which is
>>      // disabled when using -std=c99.
>> -    if (ActualArgs->isVarargsElidedUse())
>> -      MaybeRemoveCommaBeforeVaArgs(ResultToks, NextTokGetsSpace,
>> -                                   /*HasPasteOperator=*/true,
>> -                                   Macro, ArgNo, PP);
>> +    if (ActualArgs->isVarargsElidedUse() &&
>> +        MaybeRemoveCommaBeforeVaArgs(ResultToks,
>> +                                     /*HasPasteOperator=*/true,
>> +                                     Macro, ArgNo, PP))
>> +    {
>> +      // Never add a space, even if the comma, ##, or arg had a space.
>> +      NextTokGetsSpace = false;
>> +    }
> 
> Same thing here. If this does stay, the braces should be removed.
> 
>>      continue;
>>    }
>> @@ -428,7 +430,7 @@ bool TokenLexer::Lex(Token &Tok) {
>>  
>>      Tok.startToken();
>>      Tok.setFlagValue(Token::StartOfLine , AtStartOfLine);
>> -    Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace);
>> +    Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace || NextTokGetsSpace);
>>      if (CurToken == 0)
>>        Tok.setFlag(Token::LeadingEmptyMacro);
>>      return PP.HandleEndOfTokenLexer(Tok);
>> diff --git a/test/Preprocessor/macro_arg_empty.c b/test/Preprocessor/macro_arg_empty.c
>> new file mode 100644
>> index 0000000..09b8404
>> --- /dev/null
>> +++ b/test/Preprocessor/macro_arg_empty.c
>> @@ -0,0 +1,7 @@
>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>> +
>> +#define FOO(x) x
>> +#define BAR(x) x x
>> +#define BAZ(x) [x] [ x]
>> +[FOO()] [ FOO()] [BAR()] BAZ()
>> +// CHECK: [] [ ] [ ] [] [ ]
> 
> The question about trailing whitespace test cases applies here too.

Roughly the same reason as above applies here too, and again I will be
happy to change this.

>> -- 
>> 1.8.5.2
>>
>>
>> Fix whitespace handling in empty macro expansions
>>
>> When a macro expansion does not result in any tokens, and the macro name
>> is preceded by whitespace, the whitespace should be passed to the first
>> token that follows the macro expansion. Similarly when a macro expansion
>> ends with a placemarker token, and that placemarker token is preceded by
>> whitespace. This worked already for top-level macro expansions, but is
>> now extended to also work for nested macro expansions.
>> ---
>>  lib/Lex/TokenLexer.cpp          |  8 ++++++--
>>  test/Preprocessor/macro_space.c | 36 +++++++++++++++++++++++++++++++++---
>>  2 files changed, 39 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/Lex/TokenLexer.cpp b/lib/Lex/TokenLexer.cpp
>> index 921434b..1769f8e 100644
>> --- a/lib/Lex/TokenLexer.cpp
>> +++ b/lib/Lex/TokenLexer.cpp
>> @@ -484,9 +484,13 @@ bool TokenLexer::Lex(Token &Tok) {
>>    if (isFirstToken) {
>>      Tok.setFlagValue(Token::StartOfLine , AtStartOfLine);
>>      Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace);
>> -    AtStartOfLine = false;
>> -    HasLeadingSpace = false;
>> +  } else {
>> +    // If this is not the first token, we may still need to pass through
>> +    // leading whitespace if we've expanded a macro.
>> +    if (HasLeadingSpace) Tok.setFlag(Token::LeadingSpace);
> 
> This would be clearer as an else if, rather than an if inside an else.

Are you sure that would be clearer? I considered that, but opted for
this form because to me, it makes it easier to see that the if- and
else-block are for the first-token and other-tokens cases, and because
the fact that the other-tokens case consists of a single if-statement is
largely a coincidence. There are other checks that could make sense in
the other-tokens case, in which case an else-if is not possible, such as
a check for AtStartOfLine. (I think ignoring that, as I am doing now,
makes sense, but was unable to come up with a test case where it mattered.)

>>    }
>> +  AtStartOfLine = false;
>> +  HasLeadingSpace = false;
> 
> Why did you move these assignments out of the if? Changing the behaviour
> of AtStartOfLine certainly seems unrelated to the rest of your changes,
> which makes me think this might be a mistake.

When a nested macro expansion gives an empty result,
TokenLexer::PropagateLineStartLeadingSpaceInfo ends up being called and
sets both AtStartOfLine and HasLeadingSpace. When the expansion of the
outer macro continues, and another token is found, those values should
be used for that other token and then discarded. Clearing AtStartOfLine
and HasLeadingSpace inside the if-block would mean they do not get
cleared and continue to be used for all the rest of the tokens. This was
already a bit of a problem before my change, but my change makes clang
use HasLeadingSpace in cases where it would previously be ignored, so in
those cases it is now necessary for its value to be correct.

Moving the clearing of HasLeadingSpace back inside the if-block, and
running the test suite, should show that problem. I don't think moving
the clearing of AtStartOfLine back inside the if-block would have any
visible effect, but it logically belongs outside of it for the same
reason, and it logically belongs together with the clearing of
HasLeadingSpace.

>>  
>>    // Handle recursive expansion!
>>    if (!Tok.isAnnotation() && Tok.getIdentifierInfo() != 0) {
>> diff --git a/test/Preprocessor/macro_space.c b/test/Preprocessor/macro_space.c
>> index 8a47a3b..13e531f 100644
>> --- a/test/Preprocessor/macro_space.c
>> +++ b/test/Preprocessor/macro_space.c
>> @@ -1,6 +1,36 @@
>>  // RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>  
>> -#define XX
>> -! XX,
>> +#define FOO1()
>> +#define FOO2(x)x
>> +#define FOO3(x) x
>> +#define FOO4(x)x x
>> +#define FOO5(x) x x
>> +#define FOO6(x) [x]
>> +#define FOO7(x) [ x]
>> +#define FOO8(x) [x ]
>>  
>> -// CHECK: {{^}}! ,{{$}}
>> +#define TEST(FOO,x) FOO <FOO()> < FOO()> <FOO ()> <FOO( )> <FOO() > <FOO()x> <FOO() x> < FOO()x>
>> +
>> +TEST(FOO1,)
>> +// CHECK: FOO1 <> < > <> <> < > <> < > < >
>> +
>> +TEST(FOO2,)
>> +// CHECK: FOO2 <> < > <> <> < > <> < > < >
>> +
>> +TEST(FOO3,)
>> +// CHECK: FOO3 <> < > <> <> < > <> < > < >
>> +
>> +TEST(FOO4,)
>> +// CHECK: FOO4 < > < > < > < > < > < > < > < >
>> +
>> +TEST(FOO5,)
>> +// CHECK: FOO5 < > < > < > < > < > < > < > < >
>> +
>> +TEST(FOO6,)
>> +// CHECK: FOO6 <[]> < []> <[]> <[]> <[] > <[]> <[] > < []>
>> +
>> +TEST(FOO7,)
>> +// CHECK: FOO7 <[ ]> < [ ]> <[ ]> <[ ]> <[ ] > <[ ]> <[ ] > < [ ]>
>> +
>> +TEST(FOO8,)
>> +// CHECK: FOO8 <[ ]> < [ ]> <[ ]> <[ ]> <[ ] > <[ ]> <[ ] > < [ ]>
>> -- 
>> 1.8.5.2
>>
> 
>> Clean up whitespace checks
>>
>> In TokenLexer::ExpandFunctionArguments(), CurTok.hasLeadingSpace() is
>> checked in multiple locations, each time subtly differently. Checking it
>> early, when the token is seen, and using NextTokGetsSpace exclusively
>> after that makes the code simpler.
>>
>> No change in behaviour is intended.
> 
> I like this change! I was going to complain about all of the i != 0
> checks in the earlier changes before I saw this :)

Thanks! :)

Cheers,
Harald van Dijk

>> ---
>>  lib/Lex/TokenLexer.cpp | 16 +++++-----------
>>  1 file changed, 5 insertions(+), 11 deletions(-)
>>
>> diff --git a/lib/Lex/TokenLexer.cpp b/lib/Lex/TokenLexer.cpp
>> index 1769f8e..23bc2da 100644
>> --- a/lib/Lex/TokenLexer.cpp
>> +++ b/lib/Lex/TokenLexer.cpp
>> @@ -183,6 +183,9 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // preprocessor already verified that the following token is a macro name
>>      // when the #define was parsed.
>>      const Token &CurTok = Tokens[i];
>> +    if (i != 0 && !Tokens[i-1].is(tok::hashhash) && CurTok.hasLeadingSpace())
>> +      NextTokGetsSpace = true;
>> +
>>      if (CurTok.is(tok::hash) || CurTok.is(tok::hashat)) {
>>        int ArgNo = Macro->getArgumentNum(Tokens[i+1].getIdentifierInfo());
>>        assert(ArgNo != -1 && "Token following # is not an argument?");
>> @@ -207,7 +210,7 @@ void TokenLexer::ExpandFunctionArguments() {
>>  
>>        // The stringified/charified string leading space flag gets set to match
>>        // the #/#@ operator.
>> -      if (CurTok.hasLeadingSpace() || NextTokGetsSpace)
>> +      if (NextTokGetsSpace)
>>          Res.setFlag(Token::LeadingSpace);
>>  
>>        ResultToks.push_back(Res);
>> @@ -304,14 +307,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>          // before the first token should match the whitespace of the arg
>>          // identifier.
>>          ResultToks[FirstResult].setFlagValue(Token::LeadingSpace,
>> -                                             CurTok.hasLeadingSpace() ||
>>                                               NextTokGetsSpace);
>>          NextTokGetsSpace = false;
>> -      } else {
>> -        // If this is an empty argument, if there was whitespace before the
>> -        // formal token, and this is not the first token in the macro
>> -        // definition, make sure the next token gets whitespace before it.
>> -        NextTokGetsSpace |= i != 0 && CurTok.hasLeadingSpace();
>>        }
>>        continue;
Moving the clearing of HasLeadingSpace back inside the if-block, and
running the test suite, should show that problem. I don't think moving
the clearing of AtStartOfLine back inside the if-block would have any
visible effect, but it logically belongs outside of it for the same
reason, and it logically belongs together with the clearing of
HasLeadingSpace.

>>      }
>> @@ -359,8 +356,7 @@ void TokenLexer::ExpandFunctionArguments() {
>>        // assembler-with-cpp mode, invalid pastes are allowed through: in this
>>        // case, we do not want the extra whitespace to be added.  For example,
>>        // we want ". ## foo" -> ".foo" not ". foo".
>> -      if ((CurTok.hasLeadingSpace() && !PasteBefore) ||
>> -          (NextTokGetsSpace && !NonEmptyPasteBefore))
>> +      if (NextTokGetsSpace)
>>          ResultToks[ResultToks.size()-NumToks].setFlag(Token::LeadingSpace);
>>  
>>        NextTokGetsSpace = false;
>> @@ -371,8 +367,6 @@ void TokenLexer::ExpandFunctionArguments() {
>>      // 6.10.3.3p2,3) calls for a bunch of placemarker stuff to occur.  We
>>      // implement this by eating ## operators when a LHS or RHS expands to
>>      // empty.
>> -    if (!PasteBefore)
>> -      NextTokGetsSpace |= i != 0 && CurTok.hasLeadingSpace();
>>      if (PasteAfter) {
>>        // Discard the argument token and skip (don't copy to the expansion
>>        // buffer) the paste operator after it.
> 
> -- Justin Bogner
>