[PATCH] Whitespace issues during preprocessing

Wed Feb 5 14:36:40 PST 2014

On 04/02/14 22:24, Richard Smith wrote:
> On Tue, Feb 4, 2014 at 1:13 PM, Harald van Dijk <harald at gigawatt.nl
> <mailto:harald at gigawatt.nl>> wrote:
> 
>     On 04/02/14 20:25, Justin Bogner wrote:
>     > Harald van Dijk <harald at gigawatt.nl <mailto:harald at gigawatt.nl>>
>     writes:
>     >> Attached are my updated patches intended to fix various whitespace
>     >> issues in clang, modified to take Justin Bogner's comments into
>     account.
>     >> They are intended to ensure that whitespace is not inappropriately
>     >> removed just because a macro or macro argument expansion is
>     empty, and
>     >> does get removed during token pasting.
>     >
>     > I've committed the first four patches for you: r200785 through
>     r200788.
> 
>     Thanks!
> 
>     >> I have moved the handling of invalid token pastes from
>     >> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that
>     it works
>     >> for both object- and function-like macros, and both when ##'s
>     operands
>     >> use macro parameters and when they don't.
>     >
>     > Given that the tests needed to be changed and the behaviour clearly
>     > hasn't followed the comment in a while, I'm not entirely convinced
>     this
>     > is the right thing to do. Could the comment simply be wrong? Are
>     people
>     > relying on one behaviour or the other for invalid token pastes in
>     > assembler-with-cpp?
>     >
>     > Basically, is there a way to objectively say one of these
>     behaviours is
>     > better than the other?
> 
>     Having looked closer, the current behaviour is inconsistent in a way
>     that cannot really be explained to someone not familiar with a few
>     implementation details. Given
> 
>     #define foo(x) (. ## x . ## y)
> 
>     foo(y) actually expands to (.y . y) in assembler-with-cpp mode, with or
>     without my first four patches. That first result is what the comment
>     refers to; could the fact that the second result is different be an
>     oversight? There does not seem to be a reason for this difference, at
>     least. Surprisingly though, this is exactly what GCC does too.
> 
>     My last patch would have turned this into (.y .y), removing the second
>     space. That makes sense to me. (. y . y) could also be a perfectly
>     sensible result.
> 
> 
> I think (.y .y) is probably the best answer here (consistently remove
> whitespace on both sides of ## wherever it appears) -- this also
> probably better matches what MSVC does (where token-paste can result in
> multiple tokens, and seems to act as if the tokens were merely abutted
> with no adjacent whitespace). This would also behave more consistently
> when processing languages with a different lexical structure from C --
> in a language where an identifier can start with a period, (.y .y) seems
> to unambiguously be the right answer.

That is a good point about MSVC. There is actually one test case
affected by this change, using -fms-extensions, where the behaviour did
not match that of MSVC, and does now, at least for the compiler that
comes with Visual Studio 2013.

>     I suspect, but do not actually know, that nobody really uses . ## foo
>     unless foo is a macro argument, because when foo is not a macro
>     argument, there is no point in using ## in the first place. It would
>     explain why this went unnoticed for so long. And I can imagine a few
>     cases where this would be useful: assembler directives that have subtly
>     different syntax, depending on the assembler used. x y would be
>     insufficient if x is ., if y is size, and if .size is a directive, but .
>     size is a syntax error. ## would work here.
> 
>     Even if clang's behaviour should change, though, my patch does not have
>     adequate testing for these special cases, so shouldn't be applied as is.
>     Should I work on better tests, or do you think it is more likely that
>     the behaviour should remain unchanged and get decent testing?
> 
> 
> There's a risk of breaking someone's assumption by making a change here
> (especially since we'd be introducing a difference from GCC) but I think
> the new behavior is much more defensible, and there's probably no way to
> find out without trying it.

All right. I have now taken a closer look at the test cases where the
behaviour changes, and noticed that for blargh ## $foo -> blargh $foo it
would not be right to simply change the test case to blargh$foo, as that
misses the point of the test.

There was also already a test for . ## foo, but it only tested the case
where foo is a macro argument. I have extended that test to also check
what happens when foo is not a macro argument.

How does the attached patch look? I've re-tested it on sources of today,
but on top of my patch in the "r200787 - Fix whitespace handling in
empty macro expansions" thread (after your okay for that).

Cheers,
Harald van Dijk
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Fix-handling-of-whitespace-in-invalid-token-pastes.patch
Type: text/x-patch
Size: 4533 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140205/7ea19dcf/attachment.bin>