[cfe-dev] Wide strings and clang::StringLiteral.
Jean-Daniel Dupas
devlists at shadowlab.org
Tue Dec 2 00:56:26 PST 2008
Le 2 déc. 08 à 06:10, Chris Lattner a écrit :
>
> On Nov 29, 2008, at 1:00 AM, Paolo Bolzoni wrote:
>
>>
>> I need to convert the strings literals to other encoding, I was
>> planning to
>> use iconv.h's functions, but I need to know the encoding of the
>> input strings.
>>
>> So the question is, what encoding have the strings returned by
>> clang::StringLiteral::getStrData(), overall wide ones?
>
> Hi Paolo,
>
> I really have no idea. We're just reading in the raw bytes from the
> source file, so I guess it depends on whatever the source encoding
> is. In practice, this sounds like a really bad idea :).
>
> Clang doesn't have any notion of an input character set at present,
> and doesn't handle unicode escapes. How do other compilers handle
> input character sets? Are there command line options to specify it?
> Should the AST hold the string in a canonical form like UTF8?
>
> -Chris
GCC support all iconv encodings via the -finput-charset= argument.
It also have a -fexec-charset= and -fwide-exec-charset= to specify
encoding of generated constant string.
"input" defaults to the locale encoding if defined, else fall back to
UTF-8.
"exec" defaults to UTF-8.
"wide-exec" defaults to UTF16 or UTF32 based on the wide_t size.
Clang may have to do some string manipulation while compiling (for
example to convert non-ascii constant CF/NSString into UTF-16). It
will probably be easier to handle if the AST strings use a predefined
encoding (UTF-8). It iwll also be simpler for client that want to
manipulate strings.
Not to mention UTF-16/UTF-32 source files (gcc support them). It would
be very difficult (if not impossible) to keep them in UTF-16
internally, as most functions expects C string.
IMHO, if someone considere adding charset handling, he may considere
writing a converter class based on iconv for example, and not call
iconv functions directly. It will be easier to switch the underlying
library if needed (and use icu for example).
More information about the cfe-dev
mailing list