PATCH: In -traditional mode, ignore token pasting and stringification (PR16371)

Mon Jul 8 17:24:21 PDT 2013

On Mon, Jul 8, 2013 at 3:17 PM, Richard Smith <richard at metafoo.co.uk> wrote:
> On Fri, Jul 5, 2013 at 5:57 PM, Austin Seipp <aseipp at pobox.com> wrote:
>> Well, just to be clear, there's absolutely no intention of completely
>> emulating -traditional's behavior. We don't need full emulation, just
>> [Feature X].
>
> This is exactly the argument that was used last time a feature was
> added to our -traditional-cpp implementation. If we keep incrementally
> adding features, it seems very likely that we'll end up with a
> poorly-designed implementation, and no point along the way where we
> could say "at this point we stop and rewrite".

Point taken.

>> The example I brought up here about expansions in literal quotations
>> was to point out that, this patch 'implements' behavior found in GCC's
>> -traditional mode, but with an exception. Of course, that's really all
>> Clang's -traditional mode is anyway: a small collection of *some* of
>> GCCs behaviors, with caveats even at that. So, that considered I think
>> this is fine: the patch has relatively small impact/scope, and is
>> pretty simple on top of that.
>
> It seems unfortunate for neither stringization nor macro expansion
> into string literals to work. That said, if anyone actually cares
> about the latter, maybe we can persuade them to design and implement a
> proper traditional preprocessor.
>
>> On Fri, Jul 5, 2013 at 6:50 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>> On Fri, Jul 5, 2013 at 3:44 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Attached is a patch that makes the preprocessor ignore token pasting
>>>> (##) and stringification (#) when in -traditional mode. This makes it
>>>> behave more like GCC[1].
>>>>
>>>> This change fixes PR16371, and is needed for Clang to function
>>>> properly as a preprocessor for Haskell (in the Glasgow Haskell
>>>> Compiler.) If you're curious and look at the bug, I made some
>>>> incorrect assumptions about the behavior of -traditional for GCC (and
>>>> attached a bad patch,) but this fixes the problem in the principled
>>>> way. And the patch is simpler, which is good too.
>
> There's something distasteful about the whole approach here. Haskell's
> lexing rules are not the same as C or C++'s. It's wrong to use a
> standard C preprocessor in Haskell (for instance, ' will be
> mistreated, and with GHC extensions so will #), and it's also wrong to
> use a traditional C preprocessor (for instance, macros will be
> expanded inside string literals).

(I don't disagree that our current situation is a wee bit unfortunate,
just for the record.)

> This is not the only tweak you'll need to get Clang's preprocessor to
> preprocess Haskell properly. For instance, consider:
>
>   MACRO(foo') + MACRO(foo')
>
> A proper Haskell preprocessor would treat foo' as a single token.
> Clang will treat ') + MACRO(foo' as a single token.

Well, this situation is pretty unlikely as it stands because GCC
doesn't correctly lex this either. End-users can force GHC to use
something like cpphs (which has the correct behavior,) but very few
packages actually do this, while the number of packages that use the
preprocessor itself is very high. So really, lexing rules are very
rarely violated in such a way. Which leads to...

> Have you considered using cpphs instead?

Yes, and there have been some ideas of writing a proper traditional
preprocessor library and integrating it into GHC (cpphs is GPL, which
is off limits from an API standpoint.) However, in light of this work
(which is not your problem, and a longer term thing for us) I merely
attempted to make GHC use Clang, and discovered this deficiency in
-traditional's behaviour. And so the story leads us here...

I was also unaware of the NetBSD tradcpp work which Joerg mentioned,
which is also promising as a possible replacement for the longer term.

> I agree with your comments; implementing a fundamentally
> character-based preprocessor on top of our current token-based
> preprocessor isn't the right approach in the long term. But
> pragmatically, this is a small point fix to disable a feature that
> should not be enabled in our existing token-based traditional
> preprocessor, so I don't think this is a big deal. If there really is
> a demand for it, I find that more compelling than the slippery-slope
> argument.

In any case, I appreciate the timely feedback and review. Thanks.

-- 
Regards,
Austin - PGP: 4096R/0x91384671