[cfe-dev] Removing "TransferFuncs" from the analyzer

Tue Jul 12 23:10:19 PDT 2011

On Jul 12, 2011, at 12:06 AM, Jordy Rose wrote:

> Looking at this some more, it seems the only information the function summaries provide that aren't also available through RefVals are:

Hi Jordy,

The main role of the summaries is to memoize the computation of the transfer function logic.  Those should stay around primarily to make the checker faster.  You are correct, however, that they are currently used for generating the diagnostics

> 
> 1. Treating [object dealloc] differently from [object release]. This could be replaced by examining the current ExplodedNode's Stmt every time we hit a Released RefVal, or by adding a ReleasedByDeallocation RefVal::Kind. Or we could ignore it since it is so rare in Cocoa code. (I suppose non-Cocoa code doesn't have to be ref-counted for the RetainReleaseChecker to catch alloc/dealloc balances.)

This is a QOI thing, and I think it is worth keeping.  Adding a new RefVal::Kind seems inexpensive.  All of the QOI in the retain/release checker is there for a reason (because users have explicitly requested them).  I see no reason to regress on functionality here, especially when we have a way to represent them in a cleaner model.

> 
> 2. Noting that (NS|CF)MakeCollectable is ignored in non-GC mode. This is definitely nice QoI-wise but isn't (IMHO) a critical part of the bug report's usefulness.
> 
> 3. Noting that -autorelease, -retain, and -release are ignored in GC mode. Same as above.
> 
> Noting that X is ignored is something that's a little annoying to do while trying to keep GRState explosion down. My initial thought was to add a counter to RefVal for these messages, but that would mean that something like this would create two states after the branch is done (i.e. no uniquing the states).
> 
> id object = getObject();
> if (arbitraryCondition()) {
>  [object retain];
>  doSomethingWith(object);
>  [object release];
> }
> 
> Another possibility is a simple single-bit toggle; that way any ignored messages will still show up as a change in the state of the referred variable, and only branches with an odd number of ignored messages will result in distinct GRStates. This feels like more of a hack, but it costs less in memory (it could even be combined in a bitfield with the object kind).

I have another suggestion that avoids the state explosion problem, but it requires more infrastructure: annotated GRStates.

Essentially we are trying to reconstruct information here on how a GRState was constructed at a particular program point along a specific path.  The problem is that recording path-specific information in a GRState is dangerous because it can lead to state bifurcation.

Consider a new concept: AnnotatedGRState.

In pseudo-code:

class AnnotatedGRState {
  const GRState *state;
  const Annotations *annotations;

public:
  AnnotatedGRState(const GRState *st, const Annotations *annots = 0) : state(st), annotations(annots) {}
  operator const GRState*() const { return state; }
};

and the following modification to ExplodedGraph:

  ExplodedNode* getNode(const ProgramPoint& L, const GRState *State,
                        bool* IsNew = 0);

becomes

  ExplodedNode* getNode(const ProgramPoint& L, const AnnotatedGRState &State,
                        bool* IsNew = 0);

The main change is that getNode() accepts a GRState with optional annotations.  Those annotations aren't stored directly in ExplodedNode (we just keep a GRState*), but in a side-table in ExplodedGraph because annotations will be so rare.  We can then query the ExplodedGraph for the annotations for a given ExplodedNode.

Of course I've left the definition of "Annotations" ambiguous; the idea is to provide something that a checker can scribble on top of a state that can be preserved in the ExplodedGraph but does *not* taint the GRState with path-specific information that we wouldn't want in subsequent nodes in the ExplodedGraph.

This approach solves a variety of problems:

(1) CFRefCount doesn't need to grovel through the AST when generating diagnostics, like it does right now, to infer "what happened" in a state transition to generate the right diagnostic.  It just generates an annotation up front when it does the state transition (for those cases where we need to record more information), which gets stuck on the ExplodedNode.  Subsequent ExplodedNodes don't inherit the annotations; they just get the GRState.

(2) This is a general problem that all checkers need to be able to solve.  Most checkers want to have rich diagnostics, but shouldn't need to go through the same kind of complexity that CFRefCount does to generate good diagnostics.

(3) Annotations have a variety of possible implications besides just better diagnostics; they are a way of tagging points along paths that are of interest to a checker.

What do you think of this approach?

> 
> With either way of forcing a change, we'd then examine the current ExplodedNode's statement to see what was wrong.
> 
> I'm going to start working on this soon, using the bit toggle for the ignored messages, and try to come up with orthogonal patches that chip away at TransferFuncs and CFRefCount!

I won't mince words here: I really am opposed to the bit toggle approach.  It doesn't feel clean, and I think there overall is a much better general approach (e.g., annotated states) that will be useful for a wider range of checkers.  We shouldn't go down the path of stuffing *transient* information into a GRState that has the possibility of bifurcating the exploded graph.  If we find ourselves tempted to do that, it seems to me that we are lacking the needed infrastructure to do the right thing.