[cfe-dev] CFRefCount Problem #2: Region Invalidation

Tue Aug 23 16:43:08 PDT 2011

On Aug 22, 2011, at 9:58 PM, Jordy Rose wrote:

> [Background: Currently we're trying to eliminate the concept of "transfer functions" from the static analyzer, which has gradually been supplanted by the Checker interface. The only implementation of the transfer functions, CFRefCount, has two roles: general-purpose function and method evaluator, and retain-count checking for the Cocoa and CoreFoundation frameworks. The former is being moved to ExprEngine*, the latter to a regular checker called RetainReleaseChecker. Of course, there are always design issues; I'm trying to make sure they get cleared up on-list one by one.]
> 
> After recent patches, I've managed to separate the generic evaluation of functions and Objective-C messages -- basically, invalidating the arguments, implicit 'this' or message receiver, and globals -- from the specific retain-count-related effects of a couple functions. This felt like quite a major step forwards in the transition; everything else is just a sort of extrication and working out couplings that shouldn't exist.
> 
> One of those couplings, however, is that the /reference counts/ of arguments and message receivers should /not/ be invalidated on a call...but the reference counts of any /members/ should be. This is currently accomplished by use of a private shared "whitelist" of top-level symbols in a function call, set up by the transfer functions and then used by the RetainReleaseChecker. If the RetainReleaseChecker is to lose its privileged status, it won't be able to know which symbols should have their bindings preserved.
> 
> I'm pretty stuck on this one, but I came up with a few possibilities:
> 
> 1. Restore bindings in a post-call check by walking backwards in the graph to before the call's return value is set.
> 
> 2. Add the current statement to the checkRegionChanges callback, which would enable RetainReleaseChecker to skip arguments and receivers.
> 
> 3. Instead of using checkRegionChanges to handle bindings of objects passed by reference, just explicitly stop tracking anything passed by reference in a post-call check. Needs fleshing out for structs, C++ constructors, etc though.
> 
> I think the first way is the cleanest, but it's certainly extra work.
> 
> Thoughts?
> Jordy

I think (1) goes against the spirit of what we want to accomplish.  The region invalidation logic is driven by core logic for handling the "unknown effects" of a function/method call.  That should be preserved unless we have a reason not to do so.  Tampering with the bindings is against the spirit of doing that.  I also don't think this is simple because region invalidation can tamper with the internal representation of a Store in many ways.  I also think invalidation of the values of the ivars/field of an object is separate from the invalidation of the reference count.

I'm not certain what (3) means.  checkRegionChanges() includes the transitive closure of everything that get's invalidated.  That's the whole point of this callback.

I think (2) is reasonable, although not complete.  It allows checkers to see what values were passed as parameters/receivers, and do anything special that they'd like.  That seems like a very clean solution, and it leaves checkers in charge of deciding the policy they want to have.  However, it doesn't handle the following case:

 @interface Foo {
   Foo *fooRef;
 }
 ...
 @end

 ...
 Foo *x = ... // now tracking 'x'
 ...
 x-> fooRef = x;
 ...
 unknownFunction(x);

In this example, 'Foo' is a recursive data structure.  Here the value of 'x' is passed to unknownFunction(), and given (2) the checker could distinguish that 'x' was invalidated because it was a parameter and thus we should preserve the reference count information.  But we have a cyclic reference here? Because the object referenced by 'x' appears elsewhere in the visited object graph other than just being a function parameter, we should probably invalidate the reference count here as well.

To make this clearer, consider:

  ...
  Foo *x = ... // now tracking 'x'
  ...
  Foo *y = ... // now tracking 'y'
  ...
  y->fooRef = x;
  ...
  unknownFunction(y);

In this case, we will unambiguously invalidate the reference count of the object tracked by 'x', but how is this really any different than the first case?  We really don't have any special knowledge here of how the reference count of the object referred to by 'x' should be handled since it is nested in the object graph.  Both cases are the same in this regard.

So, (2) is not enough.  In order to make (2) actionable, we will also need to keep a count of the number of times a region was invalidated, and report that in the callback.  If a region was invalidated once, and appears as parameter in the argument list, then we known unambiguously that the function argument was the sole reference to the object and thus we don't need to stop tracking it's reference count.  The logic on the client side for reasoning about this is simple.

That said, I think (2) *without* the invalidation counts is good enough to get started.

Is there a particular reason you were advocating (1)?