[cfe-dev] Even more clang ideas

Fri Jul 25 18:28:09 PDT 2008

On Jul 23, 2008, at 8:36 PM, John Engelhart wrote:

> Geeze, after I sent my message I came up with a few more...
>
> Here's the -W warnings I usually (try) to run with:
>
> -Wmissing-prototypes -Wreturn-type -Wformat -Wunused-parameter - 
> Wunused-variable -Wunused-value -Wuninitialized -Wshadow -Wsign- 
> compare -Wshorten-64-to-32 -Wextra -Winit-self -Wsequence-point - 
> Wswitch-default -Wstrict-aliasing=2 -Wundef -Wpointer-arith -Wbad- 
> function-cast -Wstrict-prototypes -Wmissing-declarations -Wredundant- 
> decls -Wunreachable-code -Wcast-align -Wdiscard-qual -Wcast-qual - 
> Wstrict-overflow=5
>
> "32 <-> 64 bit issues / potential problems".
>
> The warning flag '-Wshorten-64-to-32' is a good start. '- 
> Wconversion' is also useful when switching to 64 bits.  clang can  
> leverage 'meta-information' against the problem too, like using the  
> knowledge that NS*Integer can switch sizes between 32 and 64 bits,  
> but a 32 bit in assigned/passed to a NSInteger won't scale the same  
> way.  clang can help catch this potential gotchas early before they  
> become difficult to undo/fix problems is these types of things can  
> go silently undetected and default compiler warning levels. '-Wcast- 
> align' is also something that can be checked/validated.

64 -> 32 bit issues is definitely a good source of checks, especially  
for code that must run for different archs.  Your example about  
NSInteger is especially a good one.

I actually recently implemented a simple check relating to this topic  
relating to the use of CFNumberCreate.  If one isn't careful with the  
use of this function, on a 64-bit architecture CFNumberCreate can  
actually fail to initialize some of the bits of the freshly created  
CFNumber because the integer size is greater than the integer provided  
by the programmer.  I think a lot of little simple checks like these  
would be both (a) relatively easy to implement and (b) potentially  
catch a lot of subtle bugs.

The design of the static analysis library is to help make the  
implementation of these checks relatively straightforward without any  
deep program analysis knowledge.  I myself won't be able to implement  
all of these checks, and hopefully as the tool evolves others will  
feel comfortable in implementing some of these checks as well.

> "Cross architecture issues"
>
> I can't think of any off the top of my head, but collecting possible  
> cross architecture issues patterns would be helpful.

I think this basically relates to the previous issue: APIs and type  
definitions can have different invariants or properties on different  
archs.  Some of these invariants could be checked readily with static  
analysis.

> "Possible restrict and const qualification recommendations (and  
> validation)"
>
> This really requires deep inter-procedure analysis, but if it's  
> available, then clang might be able to reason certain things about  
> the inter-procedure effects and possibilities.  Const can sometimes  
> lead to better code generation, but the real wins are usually  
> possible with restrict.  If deep inspection is possible, then some  
> degree of validation of the use of a restrict qualified pointer is  
> probably possible as well.

I don't have a reference off the top of my head, but I do know there  
was some research on doing example what you suggest.  Accurately  
inferring const and restrict may require a fairly precise points-to  
analysis, which gets tricky with all the messiness of C.  That said,  
this is something that could potentially be done, at least in some  
localized cases.

> Actually, since you're obviously deep down in the guts of the  
> grammar and compiler interactions, maybe you can offer an opinion on  
> the following:  Historically Objective-C was effectively nothing  
> more than a fancy pre-processor front end the C compiler.  In fact,  
> there was often a trivial one to one mapping from an Objective-C  
> statement to a plain C statement.
>
> @interface MyClass : ParentClass { char *buffer; } @end
> becomes something like
> typedef struct { /* ParentClass definitions */ char *buffer; }  
> MyClass;
>
> When you get right down to it, there's nothing special about a  
> 'class', it's literally nothing more than struct.
>
> Now, object oriented programming is built on polymorphic abilities,  
> each class inherits all of its super classes methods/ivars/etc.  So  
> if we have the following:
>
> @interface MyClass : ParentClass { char *buffer; } @end
> @interface MyMutableClass : MyClass { int mutationCount; } @end
>
> In code, we refer to an instantion of one of these objects with:
>
> MyClass *myClassObj;
> MyMutableClass *myMutableClassObj;
>
> OO programing (and objc) allows for the following to take place:
>
> myClassObj = myMutableClassObj;
>
> because MyMutableClass is a subclass of MyClass.  No problem, right?
>
> I'm of the opinion that this is actually a problem.  The problem has  
> nothing to do with the (correct) OO design paradigm or any  
> particular conceptual fault, but it has to do with C.
>
> Objective-C was designed a long time ago, in the pre-ANSI K&R days  
> as a matter of fact. Such assignments were possible under older K&R  
> and (I think, but may be wrong) ANSI rules.  It was frowned upon,  
> wasn't terribly good style, but you could do it and for most  
> architectures this isn't a problem because the compiler essentially  
> treated all pointers as equivalents.  Of course, the compiler is  
> free to perform pointer alignment due to the assignment, but this  
> never happened in practice (at least not for any of the main  
> architectures that are still with us today).
>
> The @interface definitions is literally like the following statements:
>
> typedef struct { char *buffer; } MyClass;
> typedef struct { char *buffer; int mutationCount; } MyMutableClass;
>
> Or, if we really wanted to, we could drop the typedef and use  
> declare it as any other struct.  Pointers to 'instantiated objects'  
> in code are either identical to their Objective-C counterparts if  
> typedefs are used, or something like the following if structs are  
> used:
>
> struct MyClass *myClassObj;
> struct MyMutableClass *myMutableClassObj;
>
> Fast forward to C99 and consider the same statement:
>
> myClassObj = myMutableClassObj;
>
> In C99, this statement is expressly forbidden as 'pointers of one  
> type may not point to a different type (except void)'.  Only  
> pointers of the same type may alias each other.  This is the 'strict  
> aliasing' rule(s).
>
> So... there's a bit of a conflict.  Such pointer aliasing is  
> permitted under the concepts of object oriented programming, but it  
> is expressly forbidden under C99 rules.  From a purely compiler  
> perspective, when you prototype a method as
>
> - (NSArray *)someMethod;
>
> you literally mean that you are returning a type of NSArray *, and  
> not any of its subclasses.

I'm not certain that the C99 rules apply in this way to Objective-C  
types, since the Objective-C type system is completely outside the  
scope of C99.  The fact that Objective-C was originally implemented as  
a layer above C just means the compiler had less information to go  
on.  One can easily get around the problem you mentioned by having the  
C implementation of Objective-C just use void* for all Objective-C  
object references (or, as you point out later, simple disable strict  
aliasing rules for Objective-C code).

>  It is, in fact, an error to return a NSMutableArray in a method  
> that's prototyped to return an NSArray due to C pointer aliasing  
> rules. The 'id' type is the closest thing that Objective-C has to a  
> 'generic object pointer type', so if a method wants to return a  
> pointer to an object of more than one type, it really should declare  
> the return type as 'id'.  Again, this is due to the C pointer  
> aliasing rules rather than any OO conceptual rules.

Again, I'm not certain how much C99's aliasing rules apply to  
Objective-C object references.  Objective-C doesn't have a formal  
specification akin to C99, so the specification (if you want to call  
it that) is whatever the current compiler implementation allows.

There are others on this list that can comment on this particular  
issue with much more authority than myself.

> It really starts to become a problem when you turn on the optimizer  
> and it begins to do optimizations that are dependent on this  
> aliasing invariant.  When I realized that this could actually be a  
> serious, very subtle problem, and started digging I found evidence  
> to support it.  For example, '-fstrict-aliasing' is disabled on  
> Apples GCC for ObjC code.

Interesting.  I think this illustrates my point that the strict  
aliasing rules in C99 don't really apply to Objective-C, at least in  
the implementation provided by GCC.  This is clearly a deliberate  
choice, likely to avoid the issues you mentioned.

>  Using '-fast' on .m files causes the compiler to emit 'cc1obj:  
> warning: command line option "-fast" is valid for C/C++ but not for  
> ObjC'
>
> I'm of the opinion that Objective-C and C are so closely linked  
> together that one can not simply say 'Pointers can not aliasing to  
> different types.  Except for ObjC class type pointer, they can alias  
> to any of their subclasses.'

It gets even more interesting when one considers categories, which  
allow one to implement essentially "open types" in Objective-C.  The  
highly dynamic nature of Objective-C allows one to change the methods  
implemented by a class at runtime, which can essentially change the  
subtyping relationships between objects at runtime.  In that sense,  
the class hierarchy is only a set of guidelines for subtyping  
relationships between Objective-C objects.  From that observation, I'm  
not certain that any conservative strict aliasing assumptions could be  
made by the compiler concerning Objective-C objects.

>  It just not possible from a practical stand point, ESPECIALLY in  
> something like GCC where it's pragmatically impossible to separate  
> out the two languages.

I'm not an expert on the GCC IR where the optimizer does much of its  
work, but the GCC frontend has a notion of the Objective-C type  
system, and uses that information to issue warnings in some cases.   
For example:

#include <Cocoa/Cocoa.h>

void foo() {
   NSString* s;
   NSObject* o;

   o = s;
   s = o;
}

gcc emits warning for the assignment of 'o' to 's' because the object  
referred to by 'o' may not be a subclass of NSString:

/tmp/t.m:8: warning: assignment from distinct Objective-C type

If one could use the Objective-C class hierarchy information to make  
conservative assumptions for use with strict aliasing optimizations,  
I'm not certain why you think gcc couldn't use that information.  The  
point I made above, however, means that even having the class  
hierarchy information available may not be enough make such assumptions.

- Ted