[cfe-dev] Removing "TransferFuncs" from the analyzer

Fri Jul 15 01:11:55 PDT 2011

On Jul 14, 2011, at 22:21, Ted Kremenek wrote:

> On Jun 27, 2011, at 7:35 PM, Jordy Rose wrote:
> 
>> In fact, the behavior of a hypothetical SimpleTransferFuncs only includes this:
>> - Invalidating ivars of ObjC message receiver
>> - Invalidating ivars of 'this' for C++ method call
>> - Invalidating function/method args
>> - Invalidating globals if necessary
>> - Conjuring return values
>> 
>> And these effects only happen if no checker claims "evalCall" for a given function call.
> 
> Indeed, but why not just move these into GRExprEngine entirely?  Why do we even need a SimpleTransferFuncs?  The only reason to have a GRTransferFuncs interface is if we want the option to swap something else in its place in the future.

Hm, good point. I don't think we'd be replacing the current transfer funcs anytime soon, so we probably wouldn't need to keep them around in the future.

>  So why might we want to do this?  Well, there is the open issue of inter-procedural analysis.  One way to do that would be to swap in a different GRTransferFuncs.  But I honestly don't think that's the right direction.  I think inter-procedural analysis will be a core part of the analyzer engine.  Yes, we will likely modularize it, perhaps making the inter-procedural analysis configurable, but GRTransferFuncs isn't that mechanism.  GRTransferFuncs was meant to capture domain-specific checker reasoning, and we have something better for that now: the Checker interface.

> Now what about the mapping of ExplodedNodes to function summaries?  That is indeed a hack, because we don't have ways to annotate ExplodedNodes with information that checkers could use to reconstruct critical information for generating diagnostics.  If we had proper annotations for ExplodedNodes, we would be able to attach this information in a structured way to nodes.  That said, what would those annotations look like?  All the summaries do is record what transfer function logic was used at a given point.  That might not be the most efficient annotation, but that's a fairly basic one.  Moreover, summaries are reused, so we don't necessarily create many of them.  In the end you are just talking about a mapping from ExplodedNodes to a handful of commonly used summaries.  I'm not convinced that is a serious scalability issue.

I suppose it's not: in a 2000-node stress test this probably maxes out at 16~32KB to store the map; much less to store the cache of summaries. Hardly a big memory drain (numbers totally made up, though).

The discussion of ExplodedNode- or ExplodedGraph-edge-associated annotations should probably stay in the other thread fork. RetainReleaseChecker would most likely keep the summary manager code as is for its regular work; sorry to be unclear.

>> - There's no evalObjCMessage, for the very sensible reason that overriding a method can make its behavior totally different. Nevertheless, there are a few messages we /do/ want to model in a RetainReleaseChecker: -[NSAutoreleasePool init], -[NSAutoreleasePool addObject:], -retain, -release, and -autorelease. Fortunately, pretty much everything else would fit in a PostObjCMessage implementation.
> 
> I'm not convinced that these couldn't be modeled using a PostObjCMessage implementation as well.  There's nothing really special about them, and really we want to move to a place where all domain-specific knowledge can be modeled in checkers.
> 
> Consider -retain.  All it does is add a +1 retain count.  That could easily be a post condition.  The only other thing that is hard here is that -retain returns an alias to the receiver.  We don't have a good way right now for a checker to articulate that except with an evalObjCMessage, but I can envision other strategies.  For example, a checker should be able to say (for many reasons) that the return value aliases the receiver, and articulate that as a /constraint/ to the ConstraintManager.  This requires the ability to possibly unify symbols (i.e., the symbol for the receiver's value and the symbol for the return value), but it is really powerful.  I also think this generalizes well to inter-procedural analysis.  For example, suppose we had *both* domain-specific knowledge from a checker about a given function as well as information garnered from analyzing the function's implementation.  That means we have two sources of information about the return value of the function call.  The ability to articulate domain-specific knowledge using *declarative* constraints is really the only way to naturally unify such information.

IPA's going to wreak havoc on our current evalCall-based checkers. :-) I agree that the 'retain' effect is easy to model post-call, but the aliasing thing is a problem. A postExpr-based checker can't even assume that a conjured return value hasn't already been squirrelled away in some other checker's slice of the GDM.

Even if we tag it with a FIXME and/or limit evalObjCMessage to RetainReleaseChecker, I think we should take the shortcut for now and save the aliasing problem for later. Though we should probably /only/ use it for receiver-aliasing -retain and -autorelease, and use postExpr<ObjCMessageExpr> for everything else (including the other effects of -retain and -autorelease).

> In the absence of symbol unification, however, we can probably special case the aliasing issue, at least for this specific case.  For example, we can create another (perhaps transient) Checker callback that asks: does this function/method return an alias?  Then ExprEngine can get in the business of doing the aliasing, and the checker just handles the truly checker-specific stuff (e.g., the reference counts).

That seems like a very specific check that would only be useful until we got a [[return_alias]] attribute, then got it into all the frameworks. :-) But it would solve the problem.

>> - When function/method arguments are invalidated, their reference counts persist. But any other sort of invalidation does destroy reference count records. Currently this uses a whitelist built before invalidating the various regions referenced in a call, but with no special consideration given to RetainReleaseChecker, that'd be a little strange. I can think of two ways around this: reconstruct the whitelist on every region invalidation if we're in a function call (probably not a good idea) and reconstruct the reference bindings in a post-call check (by walking backwards through ExplodedNodes to the pre-call state).
> 
> I think reconstructing the bindings would be extremely gross.  You'd basically be hacking on semantics in what's in the Store.  Unless the checker is trying to simulate a function call that has "writes" in the background, we shouldn't go this way.
> 
> I have a conflicting opinion, in that I think we just don't have the right APIs right now to express a dialog between the invalidation and the checker logic.  In the case of the RetainReleaseChecker, we have information when we desire not to invalidate specific symbol values, and we do that with a whitelist.  But region invalidation should only happen in a few contexts.  When we invalidate regions, we can make it a cooperative process between the StoreManager and the checkers.  The checkers are told what regions are being invalidated, and *why*, and the checkers should have enough information to determine if that reason is sufficient to invalidate whatever checker-specific state they are tracking.

That's not a bad idea, but it's hard to do. Currently, the StoreManager has no idea why things are invalidated, either; it just has a "current expr" pointer. (It's worth noting that currently we /only/ invalidate regions due to function or methods calls, including C++ 'new' calls and constructors...and also including CStringChecker::evalCall. So maybe passing the current expr down would be "good enough".)

Hm, maybe I should stop forking the threads of conversation, and stick to one issue at a time. Though it might be worth checking in the "free" changes that can be migrated cleanly out of CFRefCount, so that all that's left is the actual problems.

Thanks for your patience and all the help with this. :-)
Jordy