[cfe-dev] Wide strings and clang::StringLiteral.

Fri Dec 5 11:00:35 PST 2008

On Dec 5, 2008, at 13:48, Eli Friedman wrote:

> On Fri, Dec 5, 2008 at 4:41 AM, Neil Booth <neil at daikokuya.co.uk>  
> wrote:
>> so why not just require ASCII supersets like
>> the standard does (for ASCII hosts)?  Then your caret diagnostics
>> keep working too, and special-casing the extra characters is straight
>> forward, even for SJIS.
>
> The issue with SJIS in particular is that sometimes ASCII bytes don't
> actually represent ASCII.  Although, looking at the character set more
> carefully, it looks like that doesn't actually affect the lexer unless
> we allow Japanese characters in identifiers... that's kind of nice.
>
> I don't see where the standard requires an ASCII superset; it
> certainly requires a lot of characters from ASCII, but EBCDIC, for
> example, appears to be an legal source character set.  Oddly, though,
> UTF-16 appears to be an illegal source character set... that seems
> slightly strange to me, since nothing really depends on the source
> character set.

UTF-16 sources are not unheard of in Windows-only codes. cl handles  
them transparently, so it might unwise to make any design decisions  
which preclude them, regardless what the standard says on the matter.  
cl supports encoding sniffing to accomplish this, since system headers  
are in ASCII.

— Gordon