[cfe-commits] [PATCH] Support for universal character names in identifiers

Wed Dec 19 16:24:30 PST 2012

On Wed, Dec 19, 2012 at 1:18 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Tue, Dec 18, 2012 at 11:01 PM, Chris Lattner <clattner at apple.com> wrote:
>>
>> On Dec 18, 2012, at 8:40 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>
>>>>> Oh, I see... so the idea is to hack up getCharAndSize instead of
>>>>> calling isUCNAfterSlash/ConsumeUCNAfterSlash where we expect a UCN,
>>>>> use a marker which essentially means "saw a UCN".
>>>>>
>>>>> Seems like a workable approach; I don't think it actually helps any
>>>>> with error recovery (I'm pretty sure we can't diagnose anything
>>>>> without knowing what kind of token we're forming), but I think it will
>>>>> make the patch simpler.  I'll try to hack up a new version of my
>>>>> patch.
>>>>
>>>> Attached.
>>>
>>> And, I've discovered a rather large weakness of this approach:
>>> actually writing a correct implementation of getCharAndSizeSlow which
>>> returns a special value for UCNs is painful at best.  I might have to
>>> abandon this route.
>>
>> How terrible would it be to make getChar* return a uint32_t codepoint?  Would that fix the problem?
>
> That doesn't even help; the issue is that checking for a UCN itself
> requires calling getCharAndSize

Why? Behavior is undefined if line splicing produces a UCN, and UCNs
can't contain trigraphs.