[cfe-dev] [analyzer] exploration strategies and paths

Mon Jan 29 16:12:05 PST 2018

Hi All,

I was investigating recently bug reports with very long analyzer paths (more than a few hundred nodes).
In many of such cases the path is long for no good reason: namely, the analyzer would go 3 times around the loop before
going further.
The issue is surprisingly common, and it was exacerbated with a recent bump of analyzer thresholds.

The problem is reproduced on the following file:

```
extern int coin();

int foo() {
    int *x = 0;
    while (coin()) {
        if (coin())
            return *x;
    }
    return 0;
}

void bar() {
    while(coin())
        if (coin())
            foo();
}
```

While a shortest path to the error does not loop around, the current version of the analyzer
will go around the loop three times before going further.
(and we are quite fortunate that the unrolling limit for loops is three, otherwise it would keep going
until the unrolling limit is reached).

Multiple issues were discovered during the investigation.

1. Analyzer queue does not have a concept of priority, and performs a simple DFS by default.
Thus if the successor of the if-branch under the loop in “bar" containing the desired destination is generated second,
it will never be evaluated until the loop exploration limit is exhausted.

2. The previous issue slows down the exploration, but is not enough to get a pathological behavior of ultra-long paths.
The second problem is a combination of:
a) Block counter is not a part of a node's identity, and node A with a small block counter can be merged into a node B with a large block counter,
and the resulting node will have a block counter associated with B.
b) The issue in (a) is triggered due to our heuristic to abandon the function’s exploration and switch to conservative evaluation
if we are already *inside* the function and the block limit has been reached.

Issue (1) combined with (2-b) causes the problematic behavior: the issue is discovered on the longest path first,
and by the time the shortest path gets to “bar”, the block limit is already reached, and the switch to conservative evaluation is performed.

Thus there are two mitigation strategies currently being evaluated:

i) Remove the heuristic in (2-b)
ii) Use a priority queue to hold nodes which should be explored; prefer nodes which give new source code coverage over others
(or alternatively prefer nodes with least depth of loop stack)

Me and Artem have evaluated the option (i) and the results were surprisingly good: some reports disappear, and slightly more reports reappear.
The quality of the new reports seems to be slightly better, and I am still trying to figure out exact reasons.
I suspect merges resulting from heuristic (2-b) cause us to lose some actually valid reports.

Option (ii) has not been evaluated fully yet, but current experiments show slightly more reports (5-10%), and a radical decline in report lengths
(e.g. from 400+ to <100 for largest reports)

Are there any thoughts on the matter?

Personally I think we should do both (i) and (ii), even if they would shake up the results.
- The original idea for heuristics (2-b) was to be able to produce a report even if we are out of budget, but since it actually results in less reports,
I think the data does not validate the approach.

- Option (ii) is AFAIK how most similar engines work, and should get us much larger coverage (and shorter paths) for the same node budget,
even at the cost of log(N) overhead of the priority queue. Moreover, not having the priority queue will bite us later if we ever decide to further
increase the analyzer budget or to increase the unroll limit.

George