[PATCH] Whitespace issues during preprocessing
Justin Bogner
mail at justinbogner.com
Wed Jan 29 15:09:59 PST 2014
Harald van Dijk <harald at gigawatt.nl> writes:
> On 28/01/14 07:24, Justin Bogner wrote:
>> This looks pretty reasonable. I have a few minor comments inline.
>
> Hi Justin,
>
> Thanks for looking.
>
>>> @@ -234,6 +241,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>> if (NextTokGetsSpace) {
>>> ResultToks.back().setFlag(Token::LeadingSpace);
>>> NextTokGetsSpace = false;
>>> + } else if (PasteBefore && !NonEmptyPasteBefore) {
>>> + ResultToks.back().clearFlag(Token::LeadingSpace);
>>> }
>>
>> The braces aren't necessary for the else if block here.
>
> Right. I've seen if-statements in clang, even in this specific file,
> both with and without braces where a single statement is used. I have no
> preference in this case; if you prefer not having the braces, I will
> remove them.
I'm under the impression that without is preferred, but I don't feel
terribly strongly either. Use your discretion :)
>>> @@ -358,8 +361,8 @@ void TokenLexer::ExpandFunctionArguments() {
>>> // assembler-with-cpp mode, invalid pastes are allowed through: in
> this
>>> // case, we do not want the extra whitespace to be added. For
> example,
>>> // we want ". ## foo" -> ".foo" not ". foo".
>>> - if ((CurTok.hasLeadingSpace() || NextTokGetsSpace) &&
>>> - !NonEmptyPasteBefore)
>>> + if ((CurTok.hasLeadingSpace() && !PasteBefore) ||
>>> + (NextTokGetsSpace && !NonEmptyPasteBefore))
>>> ResultToks[ResultToks.size()-NumToks].setFlag(Token::LeadingSpace);
>>
>> Does the comment need to be updated here?
>
> As far as I can tell, my changes do not require an update of the comment
> here, but the comment is out of date and . ## foo becomes . foo in
> assembler-with-cpp mode even without my changes. (That is what I am
> getting with clang with my changes, and clang 3.3, anyway.) It should be
> easy to fix, and if so I will do so and add a test for it.
Sure. Please look into it. It's never good when the comment disagrees
with the actual behaviour.
>>> @@ -1,7 +1,21 @@
>>> -// RUN: %clang_cc1 %s -E | grep "^xy$"
>>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>>
>>> #define A x ## y
>>> blah
>>>
>>> A
>>> +// CHECK: {{^}}xy{{$}}
>>>
>>> +#define B(x, y) [v ## w] [ v##w] [w ## x] [ w##x] [x ## y] [ x##y]
> [y ## z] [ y##z]
>>> +B(x,y)
>>> +// CHECK: [vw] [ vw] [wx] [ wx] [xy] [ xy] [yz] [ yz]
>>> +B(x,)
>>> +// CHECK: [vw] [ vw] [wx] [ wx] [x] [ x] [z] [ z]
>>> +B(,y)
>>> +// CHECK: [vw] [ vw] [w] [ w] [y] [ y] [yz] [ yz]
>>> +B(,)
>>> +// CHECK: [vw] [ vw] [w] [ w] [] [ ] [z] [ z]
>>
>> These all deal with leading space, but what about trailing space? Should
>> there be a [v##w ] test?
>
> White space is treated as a property of the following token in clang. In
> [x##y], the tokens that are part of the concatenation are x, ##, and y,
> so I have included the relevant checks to see that a space before x, a
> space before ##, and a space before y all get handled correctly. Neither
> the [ nor the ] token are part of the concatenation, which is why I have
> not included those in the test, but if you prefer I will be happy to add
> them anyway. It could be useful to verify that there is no code that
> unintentionally removes the space, especially considering I specifically
> added code that does intentionally remove spacing elsewhere in this test.
Please add them anyway. More tests can't hurt.
>>> @@ -163,8 +164,6 @@ static bool
> MaybeRemoveCommaBeforeVaArgs(SmallVectorImpl<Token> &ResultToks,
>>> if (!ResultToks.empty() && ResultToks.back().is(tok::hashhash))
>>> ResultToks.pop_back();
>>>
>>> - // Never add a space, even if the comma, ##, or arg had a space.
>>> - NextTokGetsSpace = false;
>>> return true;
>>> }
>>
>> Should MaybeRemoveCommaBeforeVaArgs be made a private method of
>> TokenLexer rather tahn static? This way you wouldn't need to change this
>> assignment and duplicate the logic for all of the callers.
>
> Agreed, that would be simpler. I will change this.
>
>>> @@ -256,10 +250,13 @@ void TokenLexer::ExpandFunctionArguments() {
>>> // In Microsoft mode, remove the comma before __VA_ARGS__ to ensure
> there
>>> // are no trailing commas if __VA_ARGS__ is empty.
>>> if (!PasteBefore && ActualArgs->isVarargsElidedUse() &&
>>> - MaybeRemoveCommaBeforeVaArgs(ResultToks, NextTokGetsSpace,
>>> + MaybeRemoveCommaBeforeVaArgs(ResultToks,
>>> /*HasPasteOperator=*/false,
>>> - Macro, ArgNo, PP))
>>> + Macro, ArgNo, PP)) {
>>> + // Never add a space, even if the comma or arg had a space.
>>> + NextTokGetsSpace = false;
>>> continue;
>>> + }
>>
>> If you agree with the above, this change isn't needed (other than the
>> argument update).
>
> Indeed.
>>> @@ -0,0 +1,7 @@
>>> +// RUN: %clang_cc1 -E %s | FileCheck --strict-whitespace %s
>>> +
>>> +#define FOO(x) x
>>> +#define BAR(x) x x
>>> +#define BAZ(x) [x] [ x]
>>> +[FOO()] [ FOO()] [BAR()] BAZ()
>>> +// CHECK: [] [ ] [ ] [] [ ]
>>
>> The question about trailing whitespace test cases applies here too.
>
> Roughly the same reason as above applies here too, and again I will be
> happy to change this.
>>> @@ -484,9 +484,13 @@ bool TokenLexer::Lex(Token &Tok) {
>>> if (isFirstToken) {
>>> Tok.setFlagValue(Token::StartOfLine , AtStartOfLine);
>>> Tok.setFlagValue(Token::LeadingSpace, HasLeadingSpace);
>>> - AtStartOfLine = false;
>>> - HasLeadingSpace = false;
>>> + } else {
>>> + // If this is not the first token, we may still need to pass through
>>> + // leading whitespace if we've expanded a macro.
>>> + if (HasLeadingSpace) Tok.setFlag(Token::LeadingSpace);
>>
>> This would be clearer as an else if, rather than an if inside an else.
>
> Are you sure that would be clearer? I considered that, but opted for
> this form because to me, it makes it easier to see that the if- and
> else-block are for the first-token and other-tokens cases, and because
> the fact that the other-tokens case consists of a single if-statement is
> largely a coincidence. There are other checks that could make sense in
> the other-tokens case, in which case an else-if is not possible, such as
> a check for AtStartOfLine. (I think ignoring that, as I am doing now,
> makes sense, but was unable to come up with a test case where it mattered.)
This argument is convincing. This is fine as is.
>>> }
>>> + AtStartOfLine = false;
>>> + HasLeadingSpace = false;
>>
>> Why did you move these assignments out of the if? Changing the behaviour
>> of AtStartOfLine certainly seems unrelated to the rest of your changes,
>> which makes me think this might be a mistake.
>
> When a nested macro expansion gives an empty result,
> TokenLexer::PropagateLineStartLeadingSpaceInfo ends up being called and
> sets both AtStartOfLine and HasLeadingSpace. When the expansion of the
> outer macro continues, and another token is found, those values should
> be used for that other token and then discarded. Clearing AtStartOfLine
> and HasLeadingSpace inside the if-block would mean they do not get
> cleared and continue to be used for all the rest of the tokens. This was
> already a bit of a problem before my change, but my change makes clang
> use HasLeadingSpace in cases where it would previously be ignored, so in
> those cases it is now necessary for its value to be correct.
>
> Moving the clearing of HasLeadingSpace back inside the if-block, and
> running the test suite, should show that problem. I don't think moving
> the clearing of AtStartOfLine back inside the if-block would have any
> visible effect, but it logically belongs outside of it for the same
> reason, and it logically belongs together with the clearing of
> HasLeadingSpace.
Okay.
>>> In TokenLexer::ExpandFunctionArguments(), CurTok.hasLeadingSpace() is
>>> checked in multiple locations, each time subtly differently. Checking it
>>> early, when the token is seen, and using NextTokGetsSpace exclusively
>>> after that makes the code simpler.
>>>
>>> No change in behaviour is intended.
>>
>> I like this change! I was going to complain about all of the i != 0
>> checks in the earlier changes before I saw this :)
>
> Thanks! :)
More information about the cfe-commits
mailing list