[cfe-dev] [PATCH] C++0x unicode string and character literals now with test cases

Mon Oct 24 20:01:07 PDT 2011

On Oct 24, 2011, at 7:13 PM, Eli Friedman wrote:

> On Mon, Oct 24, 2011 at 6:43 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>> 
>> Consider the literals L"\xD83D\xDC7F" and L"\U0001F47F" (The former being the UTF-16 surrogate pair corresponding to the latter). For a Windows target these two literals are indistinguishable after we do the initial translation in Sema::ActOnStringLiteral. The resulting StringLiteral will store the same data, and have the same Kind and CharByteWidth. But on a platform with sizeof(wchar_t)==4 the two will not be the same.
> 
> That isn't what I meant; it's okay for the serialized AST to vary
> across targets.  It just shouldn't vary across hosts.

FWIW, serialization does vary across hosts, because we don't take any care whatsoever to deal with endianness issues in the AST reader/writer (and we tend to read integers from mmap'd files).

	- Doug