PATCH: In -traditional mode, ignore token pasting and stringification (PR16371)

Mon Jul 8 13:17:03 PDT 2013

On Fri, Jul 5, 2013 at 5:57 PM, Austin Seipp <aseipp at pobox.com> wrote:
> Well, just to be clear, there's absolutely no intention of completely
> emulating -traditional's behavior. We don't need full emulation, just
> [Feature X].

This is exactly the argument that was used last time a feature was
added to our -traditional-cpp implementation. If we keep incrementally
adding features, it seems very likely that we'll end up with a
poorly-designed implementation, and no point along the way where we
could say "at this point we stop and rewrite".

> The example I brought up here about expansions in literal quotations
> was to point out that, this patch 'implements' behavior found in GCC's
> -traditional mode, but with an exception. Of course, that's really all
> Clang's -traditional mode is anyway: a small collection of *some* of
> GCCs behaviors, with caveats even at that. So, that considered I think
> this is fine: the patch has relatively small impact/scope, and is
> pretty simple on top of that.

It seems unfortunate for neither stringization nor macro expansion
into string literals to work. That said, if anyone actually cares
about the latter, maybe we can persuade them to design and implement a
proper traditional preprocessor.

> On Fri, Jul 5, 2013 at 6:50 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Fri, Jul 5, 2013 at 3:44 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>
>>> Hello,
>>>
>>> Attached is a patch that makes the preprocessor ignore token pasting
>>> (##) and stringification (#) when in -traditional mode. This makes it
>>> behave more like GCC[1].
>>>
>>> This change fixes PR16371, and is needed for Clang to function
>>> properly as a preprocessor for Haskell (in the Glasgow Haskell
>>> Compiler.) If you're curious and look at the bug, I made some
>>> incorrect assumptions about the behavior of -traditional for GCC (and
>>> attached a bad patch,) but this fixes the problem in the principled
>>> way. And the patch is simpler, which is good too.

There's something distasteful about the whole approach here. Haskell's
lexing rules are not the same as C or C++'s. It's wrong to use a
standard C preprocessor in Haskell (for instance, ' will be
mistreated, and with GHC extensions so will #), and it's also wrong to
use a traditional C preprocessor (for instance, macros will be
expanded inside string literals).

This is not the only tweak you'll need to get Clang's preprocessor to
preprocess Haskell properly. For instance, consider:

  MACRO(foo') + MACRO(foo')

A proper Haskell preprocessor would treat foo' as a single token.
Clang will treat ') + MACRO(foo' as a single token.

Have you considered using cpphs instead?

>> Completely emulating -traditional would be crazy with our current lexer and
>> parser implementation; I think the only implementation we would accept would
>> be implementing it from scratch, independent from the current Lexer.  (There
>> wouldn't be much code duplication given how different the semantics of
>> traditional preprocessing are, and we could simplify the implementation by
>> assuming it's only used for preprocessed output.)
>>
>> Richard, do you have an opinion on this patch?  You've expressed some
>> concerns about -traditional-cpp before.

I agree with your comments; implementing a fundamentally
character-based preprocessor on top of our current token-based
preprocessor isn't the right approach in the long term. But
pragmatically, this is a small point fix to disable a feature that
should not be enabled in our existing token-based traditional
preprocessor, so I don't think this is a big deal. If there really is
a demand for it, I find that more compelling than the slippery-slope
argument.