[cfe-dev] [analyzer] Toning down invalidations?
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Nov 27 07:24:24 PST 2017
So far we've been preferring aggressive invalidation of large chunks of
the program state whenever we encounter anything we cannot model, be it
a function without source code or a language construct we don't support.
As i've mentioned recently, it does both eliminate certain kinds of
false positives (when we're uncertain we assume less incorrect stuff)
and introduce other false positives (when we're uncertain, we may think
that something might be possible, while in fact it isn't). Hence,
invalidation is a trade-off. Hence the question: does everybody like the
current trade-off? I'm particularly thinking about two parts of it:
* Invalidating base region when a field is invalidated (as the whole
base region is reachable through safe pointer arithmetic).
* Invalidating heap regions passed to us by const pointers when heap is
invalidated (as a non-const pointer may exist elsewhere).
Please let us know if you have other cases you'd like to think about :)
---
A bit of an explanation/motivation here.
Invalidation alone doesn't cause false positives all by itself; it only
amplifies false positives that result from our eager approach to state
splits, which causes us to sometimes produce more state splits than
necessary. For instance, in
// example 1
if (x) { ... }
invalidate(&x);
if (x) { ... }
we may assume that the first if() takes the true branch, and the second
if() takes the false branch. If the invalidation is overly aggressive,
and in fact "x" is untouched, then this path is infeasible. However, the
same problem arises in the following code that doesn't contain any
invalidation:
// example 2
if (x) { ... }
if (y) { ... }
By looking at this code, we can guarantee that both "x is true" and "x
is false" is feasible, otherwise if(x) is deadcode. We can also
guarantee that both "y is true" and "y is false" is feasible, otherwise
if(y) is deadcode. But we cannot guarantee that "x is true and y is
false" is a feasible branch just by looking at the code, hence this path
might be a false positive, and as such can be suppressed by asserting
that "x" is false in the "y is false" branch.
Example 1 is a variant of example 2 in which symbol "y" is replaced with
symbol "invalidated x". Similarly, we may run into an infeasible branch
"original x is true, invalidated x is false". However, depending on how
obvious it is to the reader that this branch is infeasible (eg.
invalidate() certainly doesn't touch "x", or it may even be invisible),
it may be very frustrating to suppress the false positive with an assertion.
Hence the call for heuristics to see if some of those frustrating
invalidations can be toned down.
More information about the cfe-dev
mailing list