[cfe-dev] Constant CF/NSString and Unicode

Jean-Daniel Dupas devlists at shadowlab.org
Thu Jul 10 07:23:33 PDT 2008

Le 10 juil. 08 à 16:00, Matthew Jimenez a écrit :

> On Jul 10, 2008, at 1:50 AM, Jean-Daniel Dupas wrote:
>> Le 10 juil. 08 à 07:56, Chris Lattner a écrit :
>>> On Jul 9, 2008, at 4:10 PM, Jean-Daniel Dupas wrote:
>>>> I just wonder if there is some kind of unicode support in
>>>> __builtin___CFStringMakeConstantString.
>>> In GCC or clang?  Clang doesn't have any unicode support yet.
>>>> In the current GCC version, when you compile an objc file, constant
>>>> strings that contains non-ascii chars are converted into utf-16
>>>> strings and a flag is set into the generated CFString.
>>> Ok.  Fariborz implemented that fwiw.
>>>> The fact that it works only for objc file look more like a design
>>>> decision than a technical limit, and this feature can easily be
>>>> extended to c files. In fact, I managed to implement this feature  
>>>> in
>>>> cc1 and it look like it works. (if I'm wrong, feel free to correct
>>>> me).
>>>> And what about clang and unicode CFString ?
>>> I'm not sure what you mean, can you explain a bit more?
>>> -Chris
>> Yep,
>> put this simple code snippet in cfstring.c :
>> #include <CoreFoundation/CoreFoundation.h>
>> int main(int argc, char **argv) {
>>  CFShowStr(CFSTR("hé hé hé"));
>>  CFShow(CFSTR("hé hé hé"));
>>  return 0;
>> }
>> if you compile this file using "gcc -o cfstring cfstring.c -framework
>> CoreFoundation" and run it  you got:
>> Length 11
>> IsEightBit 1
>> HasLengthByte 0
>> HasNullByte 1
>> InlineContents 0
>> Allocator SystemDefault
>> Mutable 0
>> Contents 0x1ff2
>> h\u221a\u00a9 h\u221a\u00a9 h\u221a\u00a9
>> Now, if you compile this same file using
>> gcc -x objective-c -o cfstring cfstring.c -framework CoreFoundation
>> the output is:
>> Length 8
>> IsEightBit 0
>> HasLengthByte 0
>> HasNullByte 0
>> InlineContents 0
>> Allocator SystemDefault
>> Mutable 0
>> Contents 0x1fee
>> h\u00e9 h\u00e9 h\u00e9
>> Maybe I miss something, but I really do not understand the current
>> limitation.
>> As clang will probably implements this feature some day, I just  
>> wonder
>> if it should duplicate the GCC behavior (emitting a warning and
>> generating an ascii based CFString) or if it can be extended to
>> support also UTF-16 CFString generation in plain C file.
> Now I'm curious. Does this behavior change using -fconstant-cfstrings
> instead of defining the language as ObjC? According to the  
> documentation,
> it looks like that is the flag to enable  
> __builtin__CFStringMakeConstantString.
> -Matthew

This flags is on by default on modern version of Xcode (I think it  
depends the macosx-min-version flags).
Turning it off (-fno-constant-cfstrings ) remove the compilation  
warning and defere it at runtime ;-)

This is what the app log when CFSTR is called with "false constant  
cfstrings" that contains something that's not ascii.

WARNING: CFSTR("h\37777777703\37777777651 h\37777777703\37777777651 h 
\37777777703\37777777651") has non-7 bit chars, interpreting using  
MacOS Roman encoding for now, but this will change. Please eliminate  
usages of non-7 bit chars (including escaped characters above \177  
octal) in CFSTR().

I'm not suprise by this result.
the GCC __builtin___CFStringMakeConstantString codegen function try to  
determine if the argument  string contains non ascii chars. If it find  
one, it try to convert the string into an unicode string and to save  
it as a constant string in the module.
But the function that converts the string and writes it, is  
implemented only in the obj-c module (cc1obj and cc1objplus) and not  
in the c one (cc1). So in C the convertion always returns null and GCC  
fall back to ascii string generation.

More information about the cfe-dev mailing list