[cfe-commits] [patch] Unicode character literals for UTF-8 source encoding

Thu Jan 12 16:22:17 PST 2012

On Wed, Jan 11, 2012 at 8:35 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
> Alright, characters for which the appropriate encoding can't be represented as a single value of the appropriate type are now disallowed in character literals.
>
> so now '\u2031' is not allowed (not even in C where the literal has type int which could represent the value) and L'\U00010000' is not allowed. Also replacing these UCNs with the actual characters results in exactly the same behavior.

Okay, that works.

+  if (!HadError && (multi_char_too_long || available_bits < needed_bits)) {
+    PP.Diag(Loc,diag::warn_char_constant_too_large);

Are there actually any cases where "available_bits < needed_bits" is
true in the current version of your patch?

-Eli

>
> On Jan 10, 2012, at 3:59 PM, Eli Friedman wrote:
>
>> On Tue, Jan 10, 2012 at 4:05 AM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>> whoops, that should be "anything that indicates '\U0010FFFD' isn't perfectly valid"
>>>
>>> Accepting larger Unicode escapes is not new with this patch (I tried the clang installed with Xcode 4.2, Apple clang version 3.0 (tags/Apple/clang-211.12) (based on LLVM 3.0svn), and `int i = '\U001F306';` gives i the value 0x001F306. Although I don't have a use-case or anything my preference is to allow the larger unicode escapes.
>>>
>>> If you want them excluded just let me know the ranges.
>>
>> Accepting it and doing something different from gcc seems likely to
>> cause issues if someone is accidentally depending on gcc's behavior.
>> I think we should either reject it or do the same thing as gcc.
>>
>> -Eli