[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Mon Oct 24 20:16:35 PDT 2011

On Mon, Oct 24, 2011 at 8:01 PM, Douglas Gregor <dgregor at apple.com> wrote:
>
> On Oct 24, 2011, at 7:13 PM, Eli Friedman wrote:
>
>> On Mon, Oct 24, 2011 at 6:43 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>>
>>> Consider the literals L"\xD83D\xDC7F" and L"\U0001F47F" (The former being the UTF-16 surrogate pair corresponding to the latter). For a Windows target these two literals are indistinguishable after we do the initial translation in Sema::ActOnStringLiteral. The resulting StringLiteral will store the same data, and have the same Kind and CharByteWidth. But on a platform with sizeof(wchar_t)==4 the two will not be the same.
>>
>> That isn't what I meant; it's okay for the serialized AST to vary
>> across targets.  It just shouldn't vary across hosts.
>
>
> FWIW, serialization does vary across hosts, because we don't take any care whatsoever to deal with endianness issues in the AST reader/writer (and we tend to read integers from mmap'd files).

Okay; then I guess we don't need to worry about either issue. :)

-Eli