[cfe-dev] Wide strings and clang::StringLiteral.

Thu Dec 4 16:21:02 PST 2008

On Dec 3, 2008, at 1:02 AM, Paolo Bolzoni wrote:
> dear cfe-devs,
>
> I think we are worrying about less important details. Universal  
> character
> names in identifiers are, of course, important. But I think it is  
> much more
> urgent finding a way to manage wide string correctly.

Yes, I agree that this is the right starting point.

> So what about focusing about a normalized way to memorize wide  
> strings and
> thinking about extended characters in identifiers later?

Sounds great to me.  A disclaimer: I don't know anything about this  
stuff, Neil, I'd very much appreciate validation that this approach  
makes sense :).

Here are some starting steps:

1) Document StringLiteral as being canonicalized to UTF8.  We'll  
require sema to translate the input string to utf8, and codegen and  
other clients to convert it to the character set they want.
2) Add -finput-charset to clang.  Is iconv generally available (e.g.  
on windows?) if not, we'll need some configury magic to detect it.
3) Teach sema about UTF8 input and iconv.  Sema should handle the  
default cases (e.g. UTF8 and character sets where no "bad" things  
occur) as quickly as possible, while falling back to iconv for hard  
cases (or emitting an error if iconv isn't available).
4) Enhance the lexer, if required, to handle lexing strings properly.
5) Enhance codegen to translate into the execution char set.
6) Start working on character constants.

Does this seem reasonable Paolo (and Neil)?

-Chris