PATCH: In -traditional mode, ignore token pasting and stringification (PR16371)

Mon Jul 8 18:02:17 PDT 2013

On Mon, Jul 8, 2013 at 5:24 PM, Austin Seipp <aseipp at pobox.com> wrote:
> On Mon, Jul 8, 2013 at 3:17 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>> On Fri, Jul 5, 2013 at 5:57 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>> Well, just to be clear, there's absolutely no intention of completely
>>> emulating -traditional's behavior. We don't need full emulation, just
>>> [Feature X].
>>
>> This is exactly the argument that was used last time a feature was
>> added to our -traditional-cpp implementation. If we keep incrementally
>> adding features, it seems very likely that we'll end up with a
>> poorly-designed implementation, and no point along the way where we
>> could say "at this point we stop and rewrite".
>
> Point taken.
>
>>> The example I brought up here about expansions in literal quotations
>>> was to point out that, this patch 'implements' behavior found in GCC's
>>> -traditional mode, but with an exception. Of course, that's really all
>>> Clang's -traditional mode is anyway: a small collection of *some* of
>>> GCCs behaviors, with caveats even at that. So, that considered I think
>>> this is fine: the patch has relatively small impact/scope, and is
>>> pretty simple on top of that.
>>
>> It seems unfortunate for neither stringization nor macro expansion
>> into string literals to work. That said, if anyone actually cares
>> about the latter, maybe we can persuade them to design and implement a
>> proper traditional preprocessor.
>>
>>> On Fri, Jul 5, 2013 at 6:50 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>> On Fri, Jul 5, 2013 at 3:44 PM, Austin Seipp <aseipp at pobox.com> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> Attached is a patch that makes the preprocessor ignore token pasting
>>>>> (##) and stringification (#) when in -traditional mode. This makes it
>>>>> behave more like GCC[1].
>>>>>
>>>>> This change fixes PR16371, and is needed for Clang to function
>>>>> properly as a preprocessor for Haskell (in the Glasgow Haskell
>>>>> Compiler.) If you're curious and look at the bug, I made some
>>>>> incorrect assumptions about the behavior of -traditional for GCC (and
>>>>> attached a bad patch,) but this fixes the problem in the principled
>>>>> way. And the patch is simpler, which is good too.
>>
>> There's something distasteful about the whole approach here. Haskell's
>> lexing rules are not the same as C or C++'s. It's wrong to use a
>> standard C preprocessor in Haskell (for instance, ' will be
>> mistreated, and with GHC extensions so will #), and it's also wrong to
>> use a traditional C preprocessor (for instance, macros will be
>> expanded inside string literals).
>
> (I don't disagree that our current situation is a wee bit unfortunate,
> just for the record.)
>
>> This is not the only tweak you'll need to get Clang's preprocessor to
>> preprocess Haskell properly. For instance, consider:
>>
>>   MACRO(foo') + MACRO(foo')
>>
>> A proper Haskell preprocessor would treat foo' as a single token.
>> Clang will treat ') + MACRO(foo' as a single token.
>
> Well, this situation is pretty unlikely as it stands because GCC
> doesn't correctly lex this either. End-users can force GHC to use
> something like cpphs (which has the correct behavior,) but very few
> packages actually do this, while the number of packages that use the
> preprocessor itself is very high. So really, lexing rules are very
> rarely violated in such a way. Which leads to...
>
>> Have you considered using cpphs instead?
>
> Yes, and there have been some ideas of writing a proper traditional
> preprocessor library and integrating it into GHC (cpphs is GPL, which
> is off limits from an API standpoint.) However, in light of this work
> (which is not your problem, and a longer term thing for us) I merely
> attempted to make GHC use Clang, and discovered this deficiency in
> -traditional's behaviour. And so the story leads us here...
>
> I was also unaware of the NetBSD tradcpp work which Joerg mentioned,
> which is also promising as a possible replacement for the longer term.
>
>> I agree with your comments; implementing a fundamentally
>> character-based preprocessor on top of our current token-based
>> preprocessor isn't the right approach in the long term. But
>> pragmatically, this is a small point fix to disable a feature that
>> should not be enabled in our existing token-based traditional
>> preprocessor, so I don't think this is a big deal. If there really is
>> a demand for it, I find that more compelling than the slippery-slope
>> argument.
>
> In any case, I appreciate the timely feedback and review. Thanks.

Patch committed as r185896.