[cfe-dev] Wide strings and clang::StringLiteral.

Jean-Daniel Dupas devlists at shadowlab.org
Tue Dec 2 00:56:26 PST 2008


Le 2 déc. 08 à 06:10, Chris Lattner a écrit :

>
> On Nov 29, 2008, at 1:00 AM, Paolo Bolzoni wrote:
>
>>
>> I need to convert the strings literals to other encoding, I was
>> planning to
>> use iconv.h's functions, but I need to know the encoding of the
>> input strings.
>>
>> So the question is, what encoding have the strings returned by
>> clang::StringLiteral::getStrData(), overall wide ones?
>
> Hi Paolo,
>
> I really have no idea.  We're just reading in the raw bytes from the
> source file, so I guess it depends on whatever the source encoding
> is.  In practice, this sounds like a really bad idea :).
>
> Clang doesn't have any notion of an input character set at present,
> and doesn't handle unicode escapes.  How do other compilers handle
> input character sets?  Are there command line options to specify it?
> Should the AST hold the string in a canonical form like UTF8?
>
> -Chris

GCC support all iconv encodings via the -finput-charset= argument.
It also have a -fexec-charset= and -fwide-exec-charset= to specify  
encoding of generated constant string.

"input" defaults to the locale encoding if defined, else fall back to  
UTF-8.
"exec" defaults to UTF-8.
"wide-exec" defaults to UTF16 or UTF32 based on the wide_t size.

Clang may have to do some string manipulation while compiling (for  
example to convert non-ascii constant CF/NSString into UTF-16). It  
will probably be easier to handle if the AST strings use a predefined  
encoding (UTF-8). It iwll also be simpler for client that want to  
manipulate strings.
Not to mention UTF-16/UTF-32 source files (gcc support them). It would  
be very difficult (if not impossible) to keep them in UTF-16  
internally, as most functions expects C string.

IMHO, if someone considere adding charset handling, he may considere  
writing a converter class based on iconv for example, and not call  
iconv functions directly. It will be easier to switch the underlying  
library if needed (and use icu for example).





More information about the cfe-dev mailing list