[PATCH] Whitespace issues during preprocessing

Tue Feb 4 13:13:25 PST 2014

On 04/02/14 20:25, Justin Bogner wrote:
> Harald van Dijk <harald at gigawatt.nl> writes:
>> Attached are my updated patches intended to fix various whitespace
>> issues in clang, modified to take Justin Bogner's comments into account.
>> They are intended to ensure that whitespace is not inappropriately
>> removed just because a macro or macro argument expansion is empty, and
>> does get removed during token pasting.
> 
> I've committed the first four patches for you: r200785 through r200788.

Thanks!

>> I have moved the handling of invalid token pastes from
>> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that it works
>> for both object- and function-like macros, and both when ##'s operands
>> use macro parameters and when they don't.
> 
> Given that the tests needed to be changed and the behaviour clearly
> hasn't followed the comment in a while, I'm not entirely convinced this
> is the right thing to do. Could the comment simply be wrong? Are people
> relying on one behaviour or the other for invalid token pastes in
> assembler-with-cpp?
> 
> Basically, is there a way to objectively say one of these behaviours is
> better than the other?

Having looked closer, the current behaviour is inconsistent in a way
that cannot really be explained to someone not familiar with a few
implementation details. Given

#define foo(x) (. ## x . ## y)

foo(y) actually expands to (.y . y) in assembler-with-cpp mode, with or
without my first four patches. That first result is what the comment
refers to; could the fact that the second result is different be an
oversight? There does not seem to be a reason for this difference, at
least. Surprisingly though, this is exactly what GCC does too.

My last patch would have turned this into (.y .y), removing the second
space. That makes sense to me. (. y . y) could also be a perfectly
sensible result.

I suspect, but do not actually know, that nobody really uses . ## foo
unless foo is a macro argument, because when foo is not a macro
argument, there is no point in using ## in the first place. It would
explain why this went unnoticed for so long. And I can imagine a few
cases where this would be useful: assembler directives that have subtly
different syntax, depending on the assembler used. x y would be
insufficient if x is ., if y is size, and if .size is a directive, but .
size is a syntax error. ## would work here.

Even if clang's behaviour should change, though, my patch does not have
adequate testing for these special cases, so shouldn't be applied as is.
Should I work on better tests, or do you think it is more likely that
the behaviour should remain unchanged and get decent testing?

Cheers,
Harald van Dijk