[cfe-dev] Wide strings and clang::StringLiteral.

Fri Dec 5 10:48:26 PST 2008

On Fri, Dec 5, 2008 at 4:41 AM, Neil Booth <neil at daikokuya.co.uk> wrote:
> so why not just require ASCII supersets like
> the standard does (for ASCII hosts)?  Then your caret diagnostics
> keep working too, and special-casing the extra characters is straight
> forward, even for SJIS.

The issue with SJIS in particular is that sometimes ASCII bytes don't
actually represent ASCII.  Although, looking at the character set more
carefully, it looks like that doesn't actually affect the lexer unless
we allow Japanese characters in identifiers... that's kind of nice.

I don't see where the standard requires an ASCII superset; it
certainly requires a lot of characters from ASCII, but EBCDIC, for
example, appears to be an legal source character set.  Oddly, though,
UTF-16 appears to be an illegal source character set... that seems
slightly strange to me, since nothing really depends on the source
character set.

I don't see why any of this affects caret diagnostics, though: column
counts should be the same no matter what encoding is used.

-Eli