[cfe-dev] [analyzer] Toning down invalidations?

Mon Nov 27 07:24:24 PST 2017

So far we've been preferring aggressive invalidation of large chunks of 
the program state whenever we encounter anything we cannot model, be it 
a function without source code or a language construct we don't support. 
As i've mentioned recently, it does both eliminate certain kinds of 
false positives (when we're uncertain we assume less incorrect stuff) 
and introduce other false positives (when we're uncertain, we may think 
that something might be possible, while in fact it isn't). Hence, 
invalidation is a trade-off. Hence the question: does everybody like the 
current trade-off? I'm particularly thinking about two parts of it:

* Invalidating base region when a field is invalidated (as the whole 
base region is reachable through safe pointer arithmetic).
* Invalidating heap regions passed to us by const pointers when heap is 
invalidated (as a non-const pointer may exist elsewhere).

Please let us know if you have other cases you'd like to think about :)

---

A bit of an explanation/motivation here.

Invalidation alone doesn't cause false positives all by itself; it only 
amplifies false positives that result from our eager approach to state 
splits, which causes us to sometimes produce more state splits than 
necessary. For instance, in

   // example 1
   if (x) { ... }
   invalidate(&x);
   if (x) { ... }

we may assume that the first if() takes the true branch, and the second 
if() takes the false branch. If the invalidation is overly aggressive, 
and in fact "x" is untouched, then this path is infeasible. However, the 
same problem arises in the following code that doesn't contain any 
invalidation:

   // example 2
   if (x) { ... }
   if (y) { ... }

By looking at this code, we can guarantee that both "x is true" and "x 
is false" is feasible, otherwise if(x) is deadcode. We can also 
guarantee that both "y is true" and "y is false" is feasible, otherwise 
if(y) is deadcode. But we cannot guarantee that "x is true and y is 
false" is a feasible branch just by looking at the code, hence this path 
might be a false positive, and as such can be suppressed by asserting 
that "x" is false in the "y is false" branch.

Example 1 is a variant of example 2 in which symbol "y" is replaced with 
symbol "invalidated x". Similarly, we may run into an infeasible branch 
"original x is true, invalidated x is false". However, depending on how 
obvious it is to the reader that this branch is infeasible (eg. 
invalidate() certainly doesn't touch "x", or it may even be invisible), 
it may be very frustrating to suppress the false positive with an assertion.

Hence the call for heuristics to see if some of those frustrating 
invalidations can be toned down.