[cfe-dev] [analyzer] exploration strategies and paths

Mon Jan 29 17:43:14 PST 2018

On 29/01/2018 5:36 PM, Péter Szécsi wrote:
> Hi George and Artem,
>
> 2018-01-30 1:44 GMT+01:00 Artem Dergachev via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>>:
>
>
>     On 29/01/2018 4:12 PM, George Karpenkov via cfe-dev wrote:
>
>         Hi All,
>
>         I was investigating recently bug reports with very long
>         analyzer paths (more than a few hundred nodes).
>         In many of such cases the path is long for no good reason:
>         namely, the analyzer would go 3 times around the loop before
>         going further.
>         The issue is surprisingly common, and it was exacerbated with
>         a recent bump of analyzer thresholds.
>
>
>     Yeah, i guess everybody who used the analyzer has seen some of
>     those nasty reports with iterating over loops 4 times. It's like
>     why does it find the issue on the last iteration rather than on
>     the first iteration, given that we use a depth-first strategy? So
>     it's a great long-overdue thing to fix.
>
>     George, do you have any non-internal before/after html reports to
>     attach?
>
>
>         The problem is reproduced on the following file:
>
>         ```
>         extern int coin();
>
>         int foo() {
>              int *x = 0;
>              while (coin()) {
>                  if (coin())
>                      return *x;
>              }
>              return 0;
>         }
>
>         void bar() {
>              while(coin())
>                  if (coin())
>                      foo();
>         }
>         ```
>
>         While a shortest path to the error does not loop around, the
>         current version of the analyzer
>         will go around the loop three times before going further.
>         (and we are quite fortunate that the unrolling limit for loops
>         is three, otherwise it would keep going
>         until the unrolling limit is reached).
>
>         Multiple issues were discovered during the investigation.
>
>         1. Analyzer queue does not have a concept of priority, and
>         performs a simple DFS by default.
>         Thus if the successor of the if-branch under the loop in “bar"
>         containing the desired destination is generated second,
>         it will never be evaluated until the loop exploration limit is
>         exhausted.
>
>         2. The previous issue slows down the exploration, but is not
>         enough to get a pathological behavior of ultra-long paths.
>         The second problem is a combination of:
>         a) Block counter is not a part of a node's identity, and node
>         A with a small block counter can be merged into a node B with
>         a large block counter,
>         and the resulting node will have a block counter associated
>         with B.
>         b) The issue in (a) is triggered due to our heuristic to
>         abandon the function’s exploration and switch to conservative
>         evaluation
>         if we are already *inside* the function and the block limit
>         has been reached.
>
>         Issue (1) combined with (2-b) causes the problematic behavior:
>         the issue is discovered on the longest path first,
>         and by the time the shortest path gets to “bar”, the block
>         limit is already reached, and the switch to conservative
>         evaluation is performed.
>
>
>     2-a is not even required here.
>
>     With our DFS exploration order, on every iteration of the
>     while-loop within bar(), we'd take the false-branch of if() within
>     bar() from the worklist, see that it goes back to loop, and end up
>     with new true-branch and false-branch nodes of the next iteration
>     on the top of the worklist. Then we pop the false-branch again,
>     etc., until we run out of block count limit while having 4
>     true-branches in the worklist. Those would therefore evaluate in
>     the opposite order, and the first time we enter foo() we'd be on
>     the 4th iteration.
>
>     This situation can happen regardless of in which order we evaluate
>     if()-branches, by slightly modifying the example. So if the idea
>     in the previous paragraph is unclear, it should still be obvious
>     that sometimes we'd run into a function call on the longer path
>     earlier than on a shorter path.
>
>     Now, once we enter foo() and immediately find the bug, we also run
>     out of block count limit within foo(). Recall that we are on the
>     4th iteration of the while-loop in bar(), and here is where the
>     bug is found. Now, once evaluation of foo() is over, we record
>     that we failed to fully inline it, so it's probably too complex,
>     so let's evaluate it conservatively.
>
>     It means that on 3th, 2nd, 1st iteration we won't be able to find
>     the bug, because foo() is evaluated conservatively. So we're stuck
>     with the long report forever.
>
> I do not exactly see it why. If I'm not mistaken you described the 
> replay-without-inlining heuristics. However, I believe this 
> information is stored in the state which means that this only affects 
> one path (in this example the 4th iteration bug finding path). But 
> whenever we simulate the path of the 3rd/2nd/1st iteration where the 
> if(coin()) is true, that is another path. Maybe I just do not see 
> something trivial os just misunderstood something but could you 
> explain me, why does it affect other paths?
>

Nope, it's not part of the program state. It's in FunctionSummaries, 
which is a field in CoreEngine. So whenever we find a function we were 
unable to inline even once during analysis, we'd never inline it anymore 
on any branch. See stuff around markReachedMaxBlockCount().

>
>         Thus there are two mitigation strategies currently being
>         evaluated:
>
>         i) Remove the heuristic in (2-b)
>         ii) Use a priority queue to hold nodes which should be
>         explored; prefer nodes which give new source code coverage
>         over others
>         (or alternatively prefer nodes with least depth of loop stack)
>
>         Me and Artem have evaluated the option (i) and the results
>         were surprisingly good: some reports disappear, and slightly
>         more reports reappear.
>         The quality of the new reports seems to be slightly better,
>         and I am still trying to figure out exact reasons.
>
>
>     Yeah, i guess some explanation is necessary here. The skew of
>     results is pretty huge, and it's surprising that the number of
>     reports actually increases.
>
> Just to make sure: Does the number of actually *different *reports 
> increases? In case of a missing uniquing location a checker could 
> generate a lot of "spam", so find and report the same bug on different 
> paths. (And turning off these heuristics could lead into that. 
> Probably you already checked this, but seems really suspicious.)

Yeah, it's like +600/-300 unique reports on my nightly internal codebase 
run, which is a huge skew.

>
> Another interesting stuff about (i) could be that how many times we 
> reached a max size ExplodedGraph (max number of steps) and how many 
> more steps we do because of turning of these heuristics? I feel like 
> this change should result higher analysis time but less coverage 
> overall. However, I am not sure how significant these changes are. So, 
> if you manage to find more valuable bugs then these changes worth it, 
> I guess.
>
>     Just to be clear, both replay-without-inlining and
>     dont-inline-again-after-bailout heuristics were disabled in this test.
>
>         I suspect merges resulting from heuristic (2-b) cause us to
>         lose some actually valid reports.
>
>
>     Because replay-without-inlining is disabled, there should not be
>     many merges.
>
>         Option (ii) has not been evaluated fully yet, but current
>         experiments show slightly more reports (5-10%), and a radical
>         decline in report lengths
>         (e.g. from 400+ to <100 for largest reports)
>
>         Are there any thoughts on the matter?
>
>         Personally I think we should do both (i) and (ii), even if
>         they would shake up the results.
>         - The original idea for heuristics (2-b) was to be able to
>         produce a report even if we are out of budget, but since it
>         actually results in less reports,
>         I think the data does not validate the approach.
>
>         - Option (ii) is AFAIK how most similar engines work, and
>         should get us much larger coverage (and shorter paths) for the
>         same node budget,
>         even at the cost of log(N) overhead of the priority queue.
>         Moreover, not having the priority queue will bite us later if
>         we ever decide to further
>         increase the analyzer budget or to increase the unroll limit.
>
>
>     In the example above, (ii) means evaluating the first true-branch
>     of the if() in bar() before the second false-branch of the if() in
>     bar(), simply because it's *on an earlier loop iteration*. This,
>     indeed, sounds like the right thing to do, like, logically,
>     hopefully we'd be able to confirm this with a more careful evaluation.
>
> Option (ii) is something I personally really support and would like to 
> see implemented in the analyzer. I was already thinking on this change 
> earlier but did not seem easy to come up with a reliable heuristic for 
> that. The aim is clear and great but how would you do it? Would you 
> rate the possibilities based on the number of visits of the possible 
> next analyzed blocks?
>
>
> Thanks,
> Peter
>