[cfe-dev] C99/C++ UCN (Universal Character Name) Support

Eric Christopher echristo at apple.com
Mon Mar 30 11:23:30 PDT 2009


On Mar 27, 2009, at 6:02 PM, Eli Friedman wrote:

> On Fri, Mar 27, 2009 at 5:45 PM, steve naroff <snaroff at apple.com>  
> wrote:
>> Part of implementing this is converting UTF-16 (\u) and UTF-32 (\U)  
>> to
>> UTF-8 (for insertion into a C-string, say).
>
> It's not very hard; one version of the formula is available at
> http://en.wikipedia.org/wiki/UTF-8.  And UTF-16 isn't really relevant
> here; \u denotes a Unicode code point, not a UTF-16 code unit.
>
>> Unfortunately, Unix doesn't appear to have any standard support for
>> this type of conversion (which surprised me).
>
> You could use iconv, although that's overkill here...

Neil is my own personal deity here on this, but gcc uses a combination  
of iconv
and it's own converters to deal with various different character set  
issues.  For these
two IIRC Neil/Zack wrote their own. iconv works really well when you  
start getting
into the more... esoteric execution character set issues though.

-eric



More information about the cfe-dev mailing list