[cfe-dev] Objc Apple Runtime and constant strings

Daniel Dunbar daniel at zuster.org
Tue Aug 12 20:27:24 PDT 2008

The plain and simple answer is because I forgot we had it. :)

Thanks for the info on the magic constant, I will look into merging the
definitions and add some more documentation referencing the stuff you
mention tomorrow.

 - Daniel

----- Original Message ----
From: Jean-Daniel Dupas <devlists at shadowlab.org>
To: cfe-dev Developers <cfe-dev at cs.uiuc.edu>
Sent: Tuesday, August 12, 2008 4:13:35 PM
Subject: [cfe-dev] Objc Apple Runtime and constant strings

Just a question.

I saw a patch commited today to add support for constant strings in  
the Apple runtime. Is there a reason to reimplement the constant  
string generation and not simply call  GetAddrOfConstantCFString ?

llvm::Constant *CGObjCMac::GenerateConstantString(const std::string  
&String) {
    return CGM:: GetAddrOfConstantCFString(String);

And now that we are here, about the magic constant in the code.

    // FIXME: I have no idea what this constant is (it is a magic
   // constant in GCC as well). Most likely the encoding of the string
   // and at least one part of it relates to UTF-16. Is this just the
   // code for UTF-8? Where is this handled for us?
   //  See: <rdr://2996215>
   unsigned flags = 0x07c8;

Obj-C constant string are nothing more than constant CFStringRef.

Each CFTypeRef is just a structure whose first bytes are defined as  
follow (in Mac OS X 10.5):

struct __CFInstance {
void *isa;
uint8_t info[4];

Each CFType structure (class) is identified by an unique integer  
generated at runtime initialization. (CFTypeID)
The CFRuntime reserved some place into the info field to store the  
CFTypeID of each object.
The remaining place (the low order byte) can be use by each class to  
store internal flags.

Now, to explain why 0x7c8.
7 is the CFString CFTypeID. It is hard coded and will never be changed.

0xc8 (11001000), mean :
- immutable string.
- zero terminated
- non unicode

If you want to generate an unicode constant string, you should use  
instead 0x7d0:
- immutable
- non zero terminated
- unicode.

All those flags are documented in the CFString sources:

I = is immutable
E = not inline contents
U = is Unicode
N = has NULL byte
L = has length byte
D = explicit deallocator for contents (for mutable objects, allocator)
C = length field is CFIndex (rather than UInt32); only meaningful for  
64-bit, really
     if needed this bit (valuable real-estate) can be given up for  
another bit elsewhere, since this info is needed just for 64-bit

Also need (only for mutable)
F = is fixed
G = has gap
Cap, DesCap = capacity

B7 B6 B5 B4 B3 B2 B1 B0
          U  N  L  C  I

B6 B5
  0  0   inline contents
  0  1   E (freed with default allocator)
  1  0   E (not freed)
  1  1   E D

!!! Note: Constant CFStrings use the bit patterns:
C8 (11001000 = default allocator, not inline, not freed contents; 8- 
bit; has NULL byte; doesn't have length; is immutable)
D0 (11010000 = default allocator, not inline, not freed contents;  
Unicode; is immutable)
The bit usages should not be modified in a way that would effect these  
bit patterns.

cfe-dev mailing list
cfe-dev at cs.uiuc.edu

More information about the cfe-dev mailing list