[PATCH] Whitespace issues during preprocessing

Wed Jan 29 15:09:59 PST 2014

Harald van Dijk <harald at gigawatt.nl> writes:
> On 28/01/14 07:24, Justin Bogner wrote:
>> This looks pretty reasonable. I have a few minor comments inline.
>
> Hi Justin,
>
> Thanks for looking.
>

>>> @@ -234,6 +241,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>>        if (NextTokGetsSpace) {
>>>          ResultToks.back().setFlag(Token::LeadingSpace);
>>>          NextTokGetsSpace = false;
>>> +      } else if (PasteBefore && !NonEmptyPasteBefore) {
>>> +        ResultToks.back().clearFlag(Token::LeadingSpace);
>>>        }
>> 
>> The braces aren't necessary for the else if block here.
>
> Right. I've seen if-statements in clang, even in this specific file,
> both with and without braces where a single statement is used. I have no
> preference in this case; if you prefer not having the braces, I will
> remove them.

I'm under the impression that without is preferred, but I don't feel
terribly strongly either. Use your discretion :)

>>> @@ -358,8 +361,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>> // assembler-with-cpp mode, invalid pastes are allowed through: in
> this
>>> // case, we do not want the extra whitespace to be added.  For
> example,
>>>        // we want ". ## foo" -> ".foo" not ". foo".
>>> -      if ((CurTok.hasLeadingSpace() || NextTokGetsSpace) &&
>>> -          !NonEmptyPasteBefore)
>>> +      if ((CurTok.hasLeadingSpace() && !PasteBefore) ||
>>> +          (NextTokGetsSpace && !NonEmptyPasteBefore))
>>>          ResultToks[ResultToks.size()-NumToks].setFlag(Token::LeadingSpace);
>> 
>> Does the comment need to be updated here?
>
> As far as I can tell, my changes do not require an update of the comment
> here, but the comment is out of date and . ## foo becomes . foo in
> assembler-with-cpp mode even without my changes. (That is what I am
> getting with clang with my changes, and clang 3.3, anyway.) It should be
> easy to fix, and if so I will do so and add a test for it.

Sure. Please look into it. It's never good when the comment disagrees
with the actual behaviour.

>>> @@ -1,7 +1,21 @@
>>> -// RUN: %clang_cc1 %s -E | grep "^xy$"
>>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>>  
>>>  #define A  x ## y
>>>  blah
>>>  
>>>  A
>>> +// CHECK: {{^}}xy{{$}}
>>>  
>>> +#define B(x, y) [v ## w] [ v##w] [w ## x] [ w##x] [x ## y] [ x##y]
> [y ## z] [ y##z]
>>> +B(x,y)
>>> +// CHECK: [vw] [ vw] [wx] [ wx] [xy] [ xy] [yz] [ yz]
>>> +B(x,)
>>> +// CHECK: [vw] [ vw] [wx] [ wx] [x] [ x] [z] [ z]
>>> +B(,y)
>>> +// CHECK: [vw] [ vw] [w] [ w] [y] [ y] [yz] [ yz]
>>> +B(,)
>>> +// CHECK: [vw] [ vw] [w] [ w] [] [ ] [z] [ z]
>> 
>> These all deal with leading space, but what about trailing space? Should
>> there be a [v##w ] test?
>
> White space is treated as a property of the following token in clang. In
> [x##y], the tokens that are part of the concatenation are x, ##, and y,
> so I have included the relevant checks to see that a space before x, a
> space before ##, and a space before y all get handled correctly. Neither
> the [ nor the ] token are part of the concatenation, which is why I have
> not included those in the test, but if you prefer I will be happy to add
> them anyway. It could be useful to verify that there is no code that
> unintentionally removes the space, especially considering I specifically
> added code that does intentionally remove spacing elsewhere in this test.

Please add them anyway. More tests can't hurt.

>>> @@ -163,8 +164,6 @@ static bool
> MaybeRemoveCommaBeforeVaArgs(SmallVectorImpl<Token> &ResultToks,
>>>    if (!ResultToks.empty() && ResultToks.back().is(tok::hashhash))
>>>      ResultToks.pop_back();
>>>  
>>> -  // Never add a space, even if the comma, ##, or arg had a space.
>>> -  NextTokGetsSpace = false;
>>>    return true;
>>>  }
>> 
>> Should MaybeRemoveCommaBeforeVaArgs be made a private method of
>> TokenLexer rather tahn static? This way you wouldn't need to change this
>> assignment and duplicate the logic for all of the callers.
>
> Agreed, that would be simpler. I will change this.
>

>>> @@ -256,10 +250,13 @@ void TokenLexer::ExpandFunctionArguments() {
>>> // In Microsoft mode, remove the comma before __VA_ARGS__ to ensure
> there
>>>      // are no trailing commas if __VA_ARGS__ is empty.
>>>      if (!PasteBefore && ActualArgs->isVarargsElidedUse() &&
>>> -        MaybeRemoveCommaBeforeVaArgs(ResultToks, NextTokGetsSpace,
>>> +        MaybeRemoveCommaBeforeVaArgs(ResultToks,
>>>                                       /*HasPasteOperator=*/false,
>>> -                                     Macro, ArgNo, PP))
>>> +                                     Macro, ArgNo, PP)) {
>>> +      // Never add a space, even if the comma or arg had a space.
>>> +      NextTokGetsSpace = false;
>>>        continue;
>>> +    }
>> 
>> If you agree with the above, this change isn't needed (other than the
>> argument update).
>
> Indeed.

>>> @@ -0,0 +1,7 @@
>>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>> +
>>> +#define FOO(x) x
>>> +#define BAR(x) x x
>>> +#define BAZ(x) [x] [ x]
>>> +[FOO()] [ FOO()] [BAR()] BAZ()
>>> +// CHECK: [] [ ] [ ] [] [ ]
>> 
>> The question about trailing whitespace test cases applies here too.
>
> Roughly the same reason as above applies here too, and again I will be
> happy to change this.

>>> @@ -484,9 +484,13 @@ bool TokenLexer::Lex(Token &Tok) {
>>>    if (isFirstToken) {
>>>      Tok.setFlagValue(Token::StartOfLine , AtStartOfLine);
>>>      Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace);
>>> -    AtStartOfLine = false;
>>> -    HasLeadingSpace = false;
>>> +  } else {
>>> +    // If this is not the first token, we may still need to pass through
>>> +    // leading whitespace if we've expanded a macro.
>>> +    if (HasLeadingSpace) Tok.setFlag(Token::LeadingSpace);
>> 
>> This would be clearer as an else if, rather than an if inside an else.
>
> Are you sure that would be clearer? I considered that, but opted for
> this form because to me, it makes it easier to see that the if- and
> else-block are for the first-token and other-tokens cases, and because
> the fact that the other-tokens case consists of a single if-statement is
> largely a coincidence. There are other checks that could make sense in
> the other-tokens case, in which case an else-if is not possible, such as
> a check for AtStartOfLine. (I think ignoring that, as I am doing now,
> makes sense, but was unable to come up with a test case where it mattered.)

This argument is convincing. This is fine as is.

>>>    }
>>> +  AtStartOfLine = false;
>>> +  HasLeadingSpace = false;
>> 
>> Why did you move these assignments out of the if? Changing the behaviour
>> of AtStartOfLine certainly seems unrelated to the rest of your changes,
>> which makes me think this might be a mistake.
>
> When a nested macro expansion gives an empty result,
> TokenLexer::PropagateLineStartLeadingSpaceInfo ends up being called and
> sets both AtStartOfLine and HasLeadingSpace. When the expansion of the
> outer macro continues, and another token is found, those values should
> be used for that other token and then discarded. Clearing AtStartOfLine
> and HasLeadingSpace inside the if-block would mean they do not get
> cleared and continue to be used for all the rest of the tokens. This was
> already a bit of a problem before my change, but my change makes clang
> use HasLeadingSpace in cases where it would previously be ignored, so in
> those cases it is now necessary for its value to be correct.
>
> Moving the clearing of HasLeadingSpace back inside the if-block, and
> running the test suite, should show that problem. I don't think moving
> the clearing of AtStartOfLine back inside the if-block would have any
> visible effect, but it logically belongs outside of it for the same
> reason, and it logically belongs together with the clearing of
> HasLeadingSpace.

Okay.

>>> In TokenLexer::ExpandFunctionArguments(), CurTok.hasLeadingSpace() is
>>> checked in multiple locations, each time subtly differently. Checking it
>>> early, when the token is seen, and using NextTokGetsSpace exclusively
>>> after that makes the code simpler.
>>>
>>> No change in behaviour is intended.
>> 
>> I like this change! I was going to complain about all of the i != 0
>> checks in the earlier changes before I saw this :)
>
> Thanks! :)