[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Sun Jul 31 01:30:26 PDT 2011

On Sun, Jul 31, 2011 at 12:58 AM, Sean Hunt <scshunt at csclub.uwaterloo.ca> wrote:
> On Sun, Jul 31, 2011 at 00:48, Eli Friedman <eli.friedman at gmail.com> wrote:>
> So I've got a couple questions.
>>
>> >
>> > Is the lexer really the appropriate place to be doing this? Originally
>> > CodeGenModule::GetStringForStringLiteral seemed like the thing I should be
>> > modifying, but I discovered that the string literal's bytes had already been
>> > zero extended by the time it got there. Would it be reasonable for the
>> > StringLiteralParser to just produce a UTF-8 encoded internal representation
>> > of the string and leave producing the final representation until later? I
>> > think the main complication with that is that I'll have to encode UCNs with
>> > their UTF-8 representation.
>>
>> Given the possibility of character escapes which can't be represented
>> in UTF-8, I'm not sure we can...
>
> What possibility is this? \UFFFFFFFF is far from valid, and no other
> character escape can get anywhere near that high.

L"\xFFFFFFFF" is allowed, though...

-Eli