[cfe-dev] Even more clang ideas

John Engelhart john.engelhart at gmail.com
Sun Aug 3 15:52:58 PDT 2008

Sorry for the response delay...

On Jul 26, 2008, at 3:02 AM, Chris Hanson wrote:

> On Jul 25, 2008, at 6:28 PM, Ted Kremenek wrote:
>>> It is, in fact, an error to return a NSMutableArray in a method
>>> that's prototyped to return an NSArray due to C pointer aliasing
>>> rules. The 'id' type is the closest thing that Objective-C has to a
>>> 'generic object pointer type', so if a method wants to return a
>>> pointer to an object of more than one type, it really should declare
>>> the return type as 'id'.  Again, this is due to the C pointer
>>> aliasing rules rather than any OO conceptual rules.
>> Again, I'm not certain how much C99's aliasing rules apply to
>> Objective-C object references.  Objective-C doesn't have a formal
>> specification akin to C99, so the specification (if you want to call
>> it that) is whatever the current compiler implementation allows.
> The current specification of Objective-C is "The Objective-C 2.0  
> Programming Language" at <http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/ 
> >.
> John is also incorrect about the above:  It is *not* an error, in  
> Objective-C, to return an instance of a subclass from a method  
> prototyped as returning an instance of the superclass.

Yes, this traces its roots back to Brad Cox's ideas and the original  
StepStone objc compiler.  This was all done in K&R C days, pre-ANSI  
even, and pointer rules were an awful lot looser back then.

> Objective-C is its own language that extends C99, not a preprocessor  
> for C99, and this is one of the extensions that Objective-C adds.

True enough, but Objective-C isn't exactly formally defined.  As  
someone here put it, it's pretty much "whatever the compiler happens  
to compile."

While it's been a long time since I've hacked on GCC internals or  
ported it to a new architecture, it used to be that objc was really  
nothing more than an 'integrated preprocessor to GCC'.  Instead of pre- 
processing the results and rewriting objc statements in to their  
equivalent C statements, it just directly creates the internal tree  
representations, essentially 'rewriting' on the fly.

An example of an older objc pre-processor: ftp://ftp.wustl.edu/pub/aminet/dev/c/OCT-1.99.lha 
   It's interesting to note that this particular pre-processor seems  
to be free from influence of any other objc front end, and allegedly a  
fairly close translation of 'Object Oriented Programming: An  
Evolutionary Approach', which laid out the bulk of objc (circa '86).

 From a high level view, it would seem that GCC still uses the object  
== struct representation internally, effectively turning all classes  
into a struct, with each ivar a member of that struct (and inheriting  
all of the parents ivars).  For example:

#import <Foundation/NSObject.h>

@interface MyObject : NSObject { @public int count; void *ptr; }

@implementation MyObject
-(int)count { return(count); }
-(void)setPtr:(void *)newPtr { ptr = newPtr; count++; }

int main(int argc, char *argv[]) {
   MyObject *obj = NULL;

   obj = [[MyObject alloc] init];

   int x = [obj count];
   [obj setPtr:NULL];
   x = [obj count];

   int y = obj->count;
   void *optr = obj->ptr;

   obj->ptr = NULL;


When we look at the gimple representation of this (gcc -fdump-tree- 
gimple-all -c FILE.m), it looks like it's still the same basic pre- 
processor infrastructure in place:

;; Function -[MyObject count] (-[MyObject count])

-[MyObject count] (selfD.2219, _cmdD.2220)
   intD.0 D.2227;

   D.2227 = selfD.2219->countD.2206;
   return D.2227;

;; Function main (main)

main (argcD.2245, argvD.2246)
   // snip
   struct MyObject * objD.2249;
   // snip
   D.2273 = OBJ_TYPE_REF(objc_msgSend_Fast;D.2270->0) (D.2270,  
   objD.2249 = (struct MyObject *) D.2273;
   // snip
   yD.2256 = objD.2249->countD.2206;
   optrD.2260 = objD.2249->ptrD.2207;
   objD.2249->ptrD.2207 = 0B;

Or, in other words, pretty much a objc -> c preprocessed representation.

C and ObjC are deeply intertwined.  Because ObjC is built on top of C,  
the rules and quirks of C bubble up to ObjC, but the reverse is not  
necessarily true.  In the old 'preprocessor' ObjC model, this wasn't a  
problem: there was always a 1:1 translation.  Whatever C code emerged  
on the other side, you could analyze it in the context of standard C  
rules.  This model was chosen for its obvious simplicity and it  
allowed for 'strict superset of C' compatibility.

>  In fact, in Objective-C it is not possible to say "this method  
> returns an instance of specifically this class and no other class"  
> -- you can only say "this method returns an instance of this class  
> or any subclass."

I disagree.  Such things were possible in older compilers because of  
pointer rules.  Type punning was always frowned upon, and is now  
strictly forbidden in C99.  Except for 'union's, C doesn't really  
provide a means for multi-type pointer representations.  It's not  
really a question of what proper OO design paradigm is, it's a  
question of "where the rubber meets the road": How do you represent  
the concept in C.

Take the following Objective-C code:

-(MyObject *)who { return(self); }

The gimple representation is:

  ;; Function -[MyObject who] (-[MyObject who])

  -[MyObject who] (selfD.2246, _cmdD.2247)
    struct MyObject * D.2251;

    D.2251 = selfD.2246;
    return D.2251;

So, when you say you are returning - (NSArray *)arrayByDoingSomething,  
and we're using the 'whatever the compiler compiles' standard, you are  
literally saying you are returning a 'struct NSArray *'.  In C99  
rules, the meaning of this is very, very clear and unambiguous: You  
return a pointer to a struct NSArray and a struct NSArray ONLY.  The  
code that calls it is almost certainly going to be something like  
'NSArray *array = [obj arrayByDoingSomething', which against turns in  
to a 'struct NSArray *'.

To return anything other than a struct NSArray * breaks C99 type  
aliasing rules.  And because ObjC is so tightly integrated with C, and  
the fact that the bulk of the GCC compiler is geared towards C, one  
can not simply dismiss this as 'Well, that's how ObjC defines  
things.'  Because you have a pointer to an object, you must live  
within C99's pointer rules.  And more to the point, you need to deal  
with the fact that people writing code generation parts of the  
compiler are going to implicitly assume that C99's pointer aliasing  
rules are being followed and write code that depends on that invariant  
being true.

> That is by design, and is not just an artifact of its original  
> mid-1980s implementation as a preprocessor.

True enough.  It was enabled by the fact that mid-1980's C compilers  
allowed for such 'free-for-all' type punning.  But C moved on, and  
explicitly made such type punning forbidden.  It is a far, FAR from  
trivial matter to untangle the two.


So, my question isn't really what the Objective-C 2.0 manual says  
(which is very little, and definitely not 'standards definition'  
quality), it's more about 'how do you do it?' (in the context of a  
modern, C99 optimizing compiler).

More information about the cfe-dev mailing list