[PATCH] Whitespace issues during preprocessing

Richard Smith richard at metafoo.co.uk
Wed Feb 5 15:13:07 PST 2014


On Wed, Feb 5, 2014 at 2:36 PM, Harald van Dijk <harald at gigawatt.nl> wrote:

> On 04/02/14 22:24, Richard Smith wrote:
> > On Tue, Feb 4, 2014 at 1:13 PM, Harald van Dijk <harald at gigawatt.nl
> > <mailto:harald at gigawatt.nl>> wrote:
> >
> >     On 04/02/14 20:25, Justin Bogner wrote:
> >     > Harald van Dijk <harald at gigawatt.nl <mailto:harald at gigawatt.nl>>
> >     writes:
> >     >> Attached are my updated patches intended to fix various whitespace
> >     >> issues in clang, modified to take Justin Bogner's comments into
> >     account.
> >     >> They are intended to ensure that whitespace is not inappropriately
> >     >> removed just because a macro or macro argument expansion is
> >     empty, and
> >     >> does get removed during token pasting.
> >     >
> >     > I've committed the first four patches for you: r200785 through
> >     r200788.
> >
> >     Thanks!
> >
> >     >> I have moved the handling of invalid token pastes from
> >     >> TokenLexer::ExpandFunctionArguments to TokenLexer::Lex, so that
> >     it works
> >     >> for both object- and function-like macros, and both when ##'s
> >     operands
> >     >> use macro parameters and when they don't.
> >     >
> >     > Given that the tests needed to be changed and the behaviour clearly
> >     > hasn't followed the comment in a while, I'm not entirely convinced
> >     this
> >     > is the right thing to do. Could the comment simply be wrong? Are
> >     people
> >     > relying on one behaviour or the other for invalid token pastes in
> >     > assembler-with-cpp?
> >     >
> >     > Basically, is there a way to objectively say one of these
> >     behaviours is
> >     > better than the other?
> >
> >     Having looked closer, the current behaviour is inconsistent in a way
> >     that cannot really be explained to someone not familiar with a few
> >     implementation details. Given
> >
> >     #define foo(x) (. ## x . ## y)
> >
> >     foo(y) actually expands to (.y . y) in assembler-with-cpp mode, with
> or
> >     without my first four patches. That first result is what the comment
> >     refers to; could the fact that the second result is different be an
> >     oversight? There does not seem to be a reason for this difference, at
> >     least. Surprisingly though, this is exactly what GCC does too.
> >
> >     My last patch would have turned this into (.y .y), removing the
> second
> >     space. That makes sense to me. (. y . y) could also be a perfectly
> >     sensible result.
> >
> >
> > I think (.y .y) is probably the best answer here (consistently remove
> > whitespace on both sides of ## wherever it appears) -- this also
> > probably better matches what MSVC does (where token-paste can result in
> > multiple tokens, and seems to act as if the tokens were merely abutted
> > with no adjacent whitespace). This would also behave more consistently
> > when processing languages with a different lexical structure from C --
> > in a language where an identifier can start with a period, (.y .y) seems
> > to unambiguously be the right answer.
>
> That is a good point about MSVC. There is actually one test case
> affected by this change, using -fms-extensions, where the behaviour did
> not match that of MSVC, and does now, at least for the compiler that
> comes with Visual Studio 2013.
>
> >     I suspect, but do not actually know, that nobody really uses . ## foo
> >     unless foo is a macro argument, because when foo is not a macro
> >     argument, there is no point in using ## in the first place. It would
> >     explain why this went unnoticed for so long. And I can imagine a few
> >     cases where this would be useful: assembler directives that have
> subtly
> >     different syntax, depending on the assembler used. x y would be
> >     insufficient if x is ., if y is size, and if .size is a directive,
> but .
> >     size is a syntax error. ## would work here.
> >
> >     Even if clang's behaviour should change, though, my patch does not
> have
> >     adequate testing for these special cases, so shouldn't be applied as
> is.
> >     Should I work on better tests, or do you think it is more likely that
> >     the behaviour should remain unchanged and get decent testing?
> >
> >
> > There's a risk of breaking someone's assumption by making a change here
> > (especially since we'd be introducing a difference from GCC) but I think
> > the new behavior is much more defensible, and there's probably no way to
> > find out without trying it.
>
> All right. I have now taken a closer look at the test cases where the
> behaviour changes, and noticed that for blargh ## $foo -> blargh $foo it
> would not be right to simply change the test case to blargh$foo, as that
> misses the point of the test.
>

Right, I see, it's just making sure that -fdollars-in-identifiers allows
the paste to work. I like your fix to the test here.


> There was also already a test for . ## foo, but it only tested the case
> where foo is a macro argument. I have extended that test to also check
> what happens when foo is not a macro argument.
>

Can you also add a test for the case where the . comes from a macro
argument? May as well be thorough =)


> How does the attached patch look? I've re-tested it on sources of today,
> but on top of my patch in the "r200787 - Fix whitespace handling in
> empty macro expansions" thread (after your okay for that).


Checking whether the previous token was a ## worries me a little. What
about cases like this:

#define foo(x, y) x####y
foo(, z)

This expands to "## z" under our current left-to-right pasting strategy,
but I think your patch drops the space. Amusingly, "##z" appears to be the
right answer under a right-to-left pasting strategy, so maybe that's OK?
Both GCC and EDG expand this to just "z"; the standard is not spectacularly
clear here, but I think we're right, because a ## produced by pasting a
placemarker token onto a ## is not a token in the replacement list (that
token is already gone).

If you want to say that we don't care about this case, I could live with
that =)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140205/d6bec916/attachment.html>


More information about the cfe-commits mailing list