[cfe-commits] [patch] Unicode character literals for UTF-8 source encoding

Wed Jan 11 21:03:39 PST 2012

On Jan 11, 2012, at 11:35 PM, Seth Cantrell wrote:

> Alright, characters for which the appropriate encoding can't be represented as a single value of the appropriate type are now disallowed in character literals.
> 
> so now '\u2031' is not allowed (not even in C where the literal has type int which could represent the value) and L'\U00010000' is not allowed. Also replacing these UCNs with the actual characters results in exactly the same behavior.

L'\U00010000' isn't allowed with -fshort-wchar, that is. Under normal circumstances it's fine.

> 
> - Seth
> 
> 
> On Jan 10, 2012, at 3:59 PM, Eli Friedman wrote:
> 
>> On Tue, Jan 10, 2012 at 4:05 AM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>> whoops, that should be "anything that indicates '\U0010FFFD' isn't perfectly valid"
>>> 
>>> Accepting larger Unicode escapes is not new with this patch (I tried the clang installed with Xcode 4.2, Apple clang version 3.0 (tags/Apple/clang-211.12) (based on LLVM 3.0svn), and `int i = '\U001F306';` gives i the value 0x001F306. Although I don't have a use-case or anything my preference is to allow the larger unicode escapes.
>>> 
>>> If you want them excluded just let me know the ranges.
>> 
>> Accepting it and doing something different from gcc seems likely to
>> cause issues if someone is accidentally depending on gcc's behavior.
>> I think we should either reject it or do the same thing as gcc.
>> 
>> -Eli
> <0001-Improves-support-for-Unicode-in-character-literals.patch><0002-Fix-char-literal-types-in-C.patch><0003-stop-claiming-unicode-escape-sequences-are-too-long-.patch><0004-Add-and-update-tests-for-character-literals.patch>