[cfe-dev] [analyzer] exploration strategies and paths

Mon Jan 29 16:44:38 PST 2018

On 29/01/2018 4:12 PM, George Karpenkov via cfe-dev wrote:
> Hi All,
>
> I was investigating recently bug reports with very long analyzer paths (more than a few hundred nodes).
> In many of such cases the path is long for no good reason: namely, the analyzer would go 3 times around the loop before
> going further.
> The issue is surprisingly common, and it was exacerbated with a recent bump of analyzer thresholds.

Yeah, i guess everybody who used the analyzer has seen some of those 
nasty reports with iterating over loops 4 times. It's like why does it 
find the issue on the last iteration rather than on the first iteration, 
given that we use a depth-first strategy? So it's a great long-overdue 
thing to fix.

George, do you have any non-internal before/after html reports to attach?

> The problem is reproduced on the following file:
>
> ```
> extern int coin();
>
> int foo() {
>      int *x = 0;
>      while (coin()) {
>          if (coin())
>              return *x;
>      }
>      return 0;
> }
>
> void bar() {
>      while(coin())
>          if (coin())
>              foo();
> }
> ```
>
> While a shortest path to the error does not loop around, the current version of the analyzer
> will go around the loop three times before going further.
> (and we are quite fortunate that the unrolling limit for loops is three, otherwise it would keep going
> until the unrolling limit is reached).
>
> Multiple issues were discovered during the investigation.
>
> 1. Analyzer queue does not have a concept of priority, and performs a simple DFS by default.
> Thus if the successor of the if-branch under the loop in “bar" containing the desired destination is generated second,
> it will never be evaluated until the loop exploration limit is exhausted.
>
> 2. The previous issue slows down the exploration, but is not enough to get a pathological behavior of ultra-long paths.
> The second problem is a combination of:
> a) Block counter is not a part of a node's identity, and node A with a small block counter can be merged into a node B with a large block counter,
> and the resulting node will have a block counter associated with B.
> b) The issue in (a) is triggered due to our heuristic to abandon the function’s exploration and switch to conservative evaluation
> if we are already *inside* the function and the block limit has been reached.
>
> Issue (1) combined with (2-b) causes the problematic behavior: the issue is discovered on the longest path first,
> and by the time the shortest path gets to “bar”, the block limit is already reached, and the switch to conservative evaluation is performed.

2-a is not even required here.

With our DFS exploration order, on every iteration of the while-loop 
within bar(), we'd take the false-branch of if() within bar() from the 
worklist, see that it goes back to loop, and end up with new true-branch 
and false-branch nodes of the next iteration on the top of the worklist. 
Then we pop the false-branch again, etc., until we run out of block 
count limit while having 4 true-branches in the worklist. Those would 
therefore evaluate in the opposite order, and the first time we enter 
foo() we'd be on the 4th iteration.

This situation can happen regardless of in which order we evaluate 
if()-branches, by slightly modifying the example. So if the idea in the 
previous paragraph is unclear, it should still be obvious that sometimes 
we'd run into a function call on the longer path earlier than on a 
shorter path.

Now, once we enter foo() and immediately find the bug, we also run out 
of block count limit within foo(). Recall that we are on the 4th 
iteration of the while-loop in bar(), and here is where the bug is 
found. Now, once evaluation of foo() is over, we record that we failed 
to fully inline it, so it's probably too complex, so let's evaluate it 
conservatively.

It means that on 3th, 2nd, 1st iteration we won't be able to find the 
bug, because foo() is evaluated conservatively. So we're stuck with the 
long report forever.

> Thus there are two mitigation strategies currently being evaluated:
>
> i) Remove the heuristic in (2-b)
> ii) Use a priority queue to hold nodes which should be explored; prefer nodes which give new source code coverage over others
> (or alternatively prefer nodes with least depth of loop stack)
>
> Me and Artem have evaluated the option (i) and the results were surprisingly good: some reports disappear, and slightly more reports reappear.
> The quality of the new reports seems to be slightly better, and I am still trying to figure out exact reasons.

Yeah, i guess some explanation is necessary here. The skew of results is 
pretty huge, and it's surprising that the number of reports actually 
increases.

Just to be clear, both replay-without-inlining and 
dont-inline-again-after-bailout heuristics were disabled in this test.

> I suspect merges resulting from heuristic (2-b) cause us to lose some actually valid reports.

Because replay-without-inlining is disabled, there should not be many 
merges.

> Option (ii) has not been evaluated fully yet, but current experiments show slightly more reports (5-10%), and a radical decline in report lengths
> (e.g. from 400+ to <100 for largest reports)
>
> Are there any thoughts on the matter?
>
> Personally I think we should do both (i) and (ii), even if they would shake up the results.
> - The original idea for heuristics (2-b) was to be able to produce a report even if we are out of budget, but since it actually results in less reports,
> I think the data does not validate the approach.
>
> - Option (ii) is AFAIK how most similar engines work, and should get us much larger coverage (and shorter paths) for the same node budget,
> even at the cost of log(N) overhead of the priority queue. Moreover, not having the priority queue will bite us later if we ever decide to further
> increase the analyzer budget or to increase the unroll limit.

In the example above, (ii) means evaluating the first true-branch of the 
if() in bar() before the second false-branch of the if() in bar(), 
simply because it's *on an earlier loop iteration*. This, indeed, sounds 
like the right thing to do, like, logically, hopefully we'd be able to 
confirm this with a more careful evaluation.

>
> George
>
>   
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev