[cfe-dev] Wide strings and clang::StringLiteral.

Dan Villiom Podlaski Christiansen danchr at gmail.com
Fri Dec 5 05:40:10 PST 2008


On 5 Dec 2008, at 11:17, Sebastian Redl wrote:

>
> On Fri, 05 Dec 2008 11:01:27 +0100, Cédric Venet
> <cedric.venet at laposte.net> wrote:
>>>> set && string contains only
>>>> characters in the range 0-0x7f" and having a slow path for  
>>>> everything
>>>> else.
>>>>
>>>
>>> Ah, right, you want to store the strings in UTF-8.  That seems  
>>> fine; I
>>> expect non-ASCII in strings is very rare
>>
>> For french programmes and probably other non-english language, non- 
>> ASCII
>> in strings is *not* very rare. Every accentued character is not ascii
>> and Most of the french sentence will have at least one accentued
> character.
>
> The question is, how many localized applications of significant size  
> don't
> manage their strings in some external resource that doesn't affect
> compilation?


You have to remember that it's not just a Japanese issue; quite a lot  
of languages commonly use non-ASCII characters. French, German and  
Spanish, to name a few. I suspect surprisingly many domain specific  
code bases aren't localised to more than one language.

I second Jean-Daniel Dupas' recommendation of ICU; other than  
translating between encodings, its extensive support for Unicode  
normalisation and canonicalisation might be useful. Imagine, for  
instance, a rewriter which enabled printf() and so on to gracefully  
degrade smart quotes depending on the runtime encoding :)

--

Dan Villiom Podlaski Christiansen, stud. scient.
danchr at gmail.com





More information about the cfe-dev mailing list