[cfe-dev] Wide strings and clang::StringLiteral.
Chris Lattner
clattner at apple.com
Thu Dec 4 16:21:02 PST 2008
On Dec 3, 2008, at 1:02 AM, Paolo Bolzoni wrote:
> dear cfe-devs,
>
> I think we are worrying about less important details. Universal
> character
> names in identifiers are, of course, important. But I think it is
> much more
> urgent finding a way to manage wide string correctly.
Yes, I agree that this is the right starting point.
> So what about focusing about a normalized way to memorize wide
> strings and
> thinking about extended characters in identifiers later?
Sounds great to me. A disclaimer: I don't know anything about this
stuff, Neil, I'd very much appreciate validation that this approach
makes sense :).
Here are some starting steps:
1) Document StringLiteral as being canonicalized to UTF8. We'll
require sema to translate the input string to utf8, and codegen and
other clients to convert it to the character set they want.
2) Add -finput-charset to clang. Is iconv generally available (e.g.
on windows?) if not, we'll need some configury magic to detect it.
3) Teach sema about UTF8 input and iconv. Sema should handle the
default cases (e.g. UTF8 and character sets where no "bad" things
occur) as quickly as possible, while falling back to iconv for hard
cases (or emitting an error if iconv isn't available).
4) Enhance the lexer, if required, to handle lexing strings properly.
5) Enhance codegen to translate into the execution char set.
6) Start working on character constants.
Does this seem reasonable Paolo (and Neil)?
-Chris
More information about the cfe-dev
mailing list