[cfe-dev] Removing "TransferFuncs" from the analyzer

Fri Jul 15 00:14:28 PDT 2011

On Jul 14, 2011, at 22:50, Ted Kremenek wrote:

> That said, you are right about ExplodedNodes being FoldingSetNodes.  Annotations really have more to do with the *transition* between two nodes, so perhaps the annotation should be associated with the edge between the predecessor and the successor node?  That way you can have multiple annotations associated with an ExplodedNode, but those annotations are contextual to the transition.  If we did this, then maybe ExplodedNode:;getNode() wouldn't reason about annotations at all.  Instead, Checker::addTransition() could call generate the new node, and then tell ExplodedGraph to add an annotation between the predecessor node (which CheckerContext knows about) and the new node.  The annotation would then go into a side table.
> 
> I actually really like this idea.  Right now CFRefCount (and possibly other checkers) need to walk the ExplodedGraph path to *infer* transition information.  Annotations just record that information for transition.

I like this too. Most BugReporterVisitors are looking for the first state where some condition holds, but that's implemented by comparing a node and its predecessor. Having methods explicitly reasoning about at edges could make this a lot easier.

On the other hand, can two checkers generate the same edge, differing only in annotation data? Since ExplodedNodes are uniqued, and their pred/succ sets are uniqued, my guess is that this is entirely possible. That seems to make the difference between an edge and a node much less interesting.

> If we got clever with our representation, and we wanted the option to have annotations be lightweight but possibly prolific, we can possibly optimize how edge sets are represented in ExplodedNode to better store annotations.  I see that as an optimization problem.  I really believe that the majority of nodes, which are not checker generated, would not have annotations.

The dumb way seems to be a DenseMap<pair<ExplodedNode*, ExplodedNode*>, Annotation>, where Annotation is some kind of small map data structure. Alternately, Annotation could be a simple tag-value pair—if checkers want to store more than one annotation in a single edge, they have to do it manually, under a single tag. I think the "two checkers, same edge" problem is going to rule this out though.

(This has echoes of the SummaryMap in CFRefCount now, but available to all checkers. I guess that's all right.)

>> I guess the big question is whether or not this is actually necessary: will checkers need a way to associate information with certain nodes /besides/ RetainReleaseChecker? Probably yes, but there should probably be a guideline to avoid over-using annotations, putting off most of the work until you know there's a diagnostic.
> 
> I think this would be useful for a variety of checkers.  The retain release checker would make use of it, and so would a general malloc/free checker, or really any type state checker (including the PthreadLockChecker).  Type state checking is one of the most frequent checking we can do.

I'm guessing you mean /diagnostics/ for type state checkers, because the actual type state /is/ path-sensitive non-transient information. I can't think of a time when a checker needs to look at a past node /except/ when reverse-walking the graph, and I don't know when checkers reverse-walk the graph besides emitting path diagnostics.

Oh wait, I can think of one: persisting the retain-release whitelist across evalCall. :-) But that's even more transient than usual, and shouldn't stay in the graph after evalCall finishes. So, just kidding.

> Another thing we could possibly provide are "implicit annotations."  For example, suppose we provide an API to query the annotations for an <ExplodedNode, ExplodedNode> pair.  Some of those could be explicitly recorded in a side table, and some of those could be lazily generated from a Checker via a callback.  For example, this is essentially what CFRefCount does: it reverse engineers state transitions when generating diagnostics.  For most cases it doesn't need the function summary (which could be captured by an explicit annotation recorded in the ExplodedGraph), and for those cases the annotation could be generated lazily and on-demand.  Mechanically, much of the logic in the RetainReleaseChecker for generating diagnostics would be the same, but it could be modularized based on a generic, and extensible annotation API for node transitions.

Would this be where you can ask a checker to get an annotation for /any/ edge? Or where a checker registers an annotation on a specific edge, but with a callback rather than an immediate value? (The former seems to have little benefit over explicitly calling a helper function.)

What do you see as a "usual" annotation? A direct translation of CFRefCount would be "any function summary", possibly optimized to "any function summary that affects ref counts". My original plan was "events that can't (easily) be inferred from a change in RefVal". It sounds like you might be suggesting "all changes in RefVals", so that we can directly ask something like

graph->getAnnotation<RefValChange>(edge, symbol);

I'll admit that looks pretty, but I'm not sure I'd want to be recording that much information in the table when you really can just compare the two nodes' states' RefVals. And if this calls a callback on RetainReleaseChecker, well, why not just do that to begin with?

this->getChange(edge, symbol);

...but mostly I'm just trying to play devil's advocate so that we don't pick something because it's the /only/ idea. (I'm glad you think adding the new infrastructure is worth not abusing the path-sensitive data; that's a call I wouldn't make.)

Jordy