[cfe-dev] Wide strings and clang::StringLiteral.

Chris Lattner clattner at apple.com
Fri Dec 5 09:02:05 PST 2008


On Dec 5, 2008, at 4:41 AM, Neil Booth wrote:
>> 1) Document StringLiteral as being canonicalized to UTF8.  We'll
>> require sema to translate the input string to utf8, and codegen and
>> other clients to convert it to the character set they want.
>> 2) Add -finput-charset to clang.  Is iconv generally available (e.g.
>> on windows?) if not, we'll need some configury magic to detect it.
>> 3) Teach sema about UTF8 input and iconv.  Sema should handle the
>> default cases (e.g. UTF8 and character sets where no "bad" things
>> occur) as quickly as possible, while falling back to iconv for hard
>> cases (or emitting an error if iconv isn't available).
>> 4) Enhance the lexer, if required, to handle lexing strings properly.
>> 5) Enhance codegen to translate into the execution char set.
>> 6) Start working on character constants.
>>
>> Does this seem reasonable Paolo (and Neil)?
>
> It should work, but will break caret diagnostics I expect.
>
> There's no real need for such flexibility though - the standard
> doesn't permit UTF-16, UTF-32 etc; and I've never heard of anyone
> wanting to use them, so why not just require ASCII supersets like
> the standard does (for ASCII hosts)?  Then your caret diagnostics
> keep working too, and special-casing the extra characters is straight
> forward, even for SJIS.

I have no desire to go above and beyond the standard unless (e.g.) GCC  
supports some extension and there is a large body of code that depends  
on it.

Please remember that I know very little about this, so if I suggest  
something silly, it is probably out of ignorance rather than some  
devious plan :)

> The standard also requires input to be in the current locale; is
> there any need to be more relaxed?

No.

> Realistically all the source
> has to be in the same charset, and that charset must include the
> ability to read the system headers.  You then just get to use
> mbtowc in a few places.

Can you give some pseudocode of what you mean?

-Chris



More information about the cfe-dev mailing list