[cfe-dev] Even more clang ideas
John Engelhart
john.engelhart at gmail.com
Sun Aug 3 15:52:58 PDT 2008
Sorry for the response delay...
On Jul 26, 2008, at 3:02 AM, Chris Hanson wrote:
> On Jul 25, 2008, at 6:28 PM, Ted Kremenek wrote:
>
>>> It is, in fact, an error to return a NSMutableArray in a method
>>> that's prototyped to return an NSArray due to C pointer aliasing
>>> rules. The 'id' type is the closest thing that Objective-C has to a
>>> 'generic object pointer type', so if a method wants to return a
>>> pointer to an object of more than one type, it really should declare
>>> the return type as 'id'. Again, this is due to the C pointer
>>> aliasing rules rather than any OO conceptual rules.
>>
>> Again, I'm not certain how much C99's aliasing rules apply to
>> Objective-C object references. Objective-C doesn't have a formal
>> specification akin to C99, so the specification (if you want to call
>> it that) is whatever the current compiler implementation allows.
>
> The current specification of Objective-C is "The Objective-C 2.0
> Programming Language" at <http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/
> >.
>
> John is also incorrect about the above: It is *not* an error, in
> Objective-C, to return an instance of a subclass from a method
> prototyped as returning an instance of the superclass.
Yes, this traces its roots back to Brad Cox's ideas and the original
StepStone objc compiler. This was all done in K&R C days, pre-ANSI
even, and pointer rules were an awful lot looser back then.
>
> Objective-C is its own language that extends C99, not a preprocessor
> for C99, and this is one of the extensions that Objective-C adds.
True enough, but Objective-C isn't exactly formally defined. As
someone here put it, it's pretty much "whatever the compiler happens
to compile."
While it's been a long time since I've hacked on GCC internals or
ported it to a new architecture, it used to be that objc was really
nothing more than an 'integrated preprocessor to GCC'. Instead of pre-
processing the results and rewriting objc statements in to their
equivalent C statements, it just directly creates the internal tree
representations, essentially 'rewriting' on the fly.
An example of an older objc pre-processor: ftp://ftp.wustl.edu/pub/aminet/dev/c/OCT-1.99.lha
It's interesting to note that this particular pre-processor seems
to be free from influence of any other objc front end, and allegedly a
fairly close translation of 'Object Oriented Programming: An
Evolutionary Approach', which laid out the bulk of objc (circa '86).
From a high level view, it would seem that GCC still uses the object
== struct representation internally, effectively turning all classes
into a struct, with each ivar a member of that struct (and inheriting
all of the parents ivars). For example:
#import <Foundation/NSObject.h>
@interface MyObject : NSObject { @public int count; void *ptr; }
@end
@implementation MyObject
-(int)count { return(count); }
-(void)setPtr:(void *)newPtr { ptr = newPtr; count++; }
@end
int main(int argc, char *argv[]) {
MyObject *obj = NULL;
obj = [[MyObject alloc] init];
int x = [obj count];
[obj setPtr:NULL];
x = [obj count];
int y = obj->count;
void *optr = obj->ptr;
obj->ptr = NULL;
return(0);
}
When we look at the gimple representation of this (gcc -fdump-tree-
gimple-all -c FILE.m), it looks like it's still the same basic pre-
processor infrastructure in place:
;; Function -[MyObject count] (-[MyObject count])
-[MyObject count] (selfD.2219, _cmdD.2220)
{
intD.0 D.2227;
D.2227 = selfD.2219->countD.2206;
return D.2227;
}
;; Function main (main)
main (argcD.2245, argvD.2246)
{
// snip
struct MyObject * objD.2249;
// snip
D.2273 = OBJ_TYPE_REF(objc_msgSend_Fast;D.2270->0) (D.2270,
_OBJC_SELECTOR_REFERENCES_1.5D.2272);
objD.2249 = (struct MyObject *) D.2273;
// snip
yD.2256 = objD.2249->countD.2206;
optrD.2260 = objD.2249->ptrD.2207;
objD.2249->ptrD.2207 = 0B;
//snip
Or, in other words, pretty much a objc -> c preprocessed representation.
C and ObjC are deeply intertwined. Because ObjC is built on top of C,
the rules and quirks of C bubble up to ObjC, but the reverse is not
necessarily true. In the old 'preprocessor' ObjC model, this wasn't a
problem: there was always a 1:1 translation. Whatever C code emerged
on the other side, you could analyze it in the context of standard C
rules. This model was chosen for its obvious simplicity and it
allowed for 'strict superset of C' compatibility.
> In fact, in Objective-C it is not possible to say "this method
> returns an instance of specifically this class and no other class"
> -- you can only say "this method returns an instance of this class
> or any subclass."
I disagree. Such things were possible in older compilers because of
pointer rules. Type punning was always frowned upon, and is now
strictly forbidden in C99. Except for 'union's, C doesn't really
provide a means for multi-type pointer representations. It's not
really a question of what proper OO design paradigm is, it's a
question of "where the rubber meets the road": How do you represent
the concept in C.
Take the following Objective-C code:
-(MyObject *)who { return(self); }
The gimple representation is:
;; Function -[MyObject who] (-[MyObject who])
-[MyObject who] (selfD.2246, _cmdD.2247)
{
struct MyObject * D.2251;
D.2251 = selfD.2246;
return D.2251;
}
So, when you say you are returning - (NSArray *)arrayByDoingSomething,
and we're using the 'whatever the compiler compiles' standard, you are
literally saying you are returning a 'struct NSArray *'. In C99
rules, the meaning of this is very, very clear and unambiguous: You
return a pointer to a struct NSArray and a struct NSArray ONLY. The
code that calls it is almost certainly going to be something like
'NSArray *array = [obj arrayByDoingSomething', which against turns in
to a 'struct NSArray *'.
To return anything other than a struct NSArray * breaks C99 type
aliasing rules. And because ObjC is so tightly integrated with C, and
the fact that the bulk of the GCC compiler is geared towards C, one
can not simply dismiss this as 'Well, that's how ObjC defines
things.' Because you have a pointer to an object, you must live
within C99's pointer rules. And more to the point, you need to deal
with the fact that people writing code generation parts of the
compiler are going to implicitly assume that C99's pointer aliasing
rules are being followed and write code that depends on that invariant
being true.
>
> That is by design, and is not just an artifact of its original
> mid-1980s implementation as a preprocessor.
True enough. It was enabled by the fact that mid-1980's C compilers
allowed for such 'free-for-all' type punning. But C moved on, and
explicitly made such type punning forbidden. It is a far, FAR from
trivial matter to untangle the two.
http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
http://www.mail-archive.com/list@epicsol.org/msg00489.html
So, my question isn't really what the Objective-C 2.0 manual says
(which is very little, and definitely not 'standards definition'
quality), it's more about 'how do you do it?' (in the context of a
modern, C99 optimizing compiler).
More information about the cfe-dev
mailing list