[cfe-dev] [analyzer] exploration strategies and paths
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue Jan 30 13:10:44 PST 2018
On 30/01/2018 12:40 PM, Gábor Horváth via cfe-dev wrote:
> Hi George, Artem,
>
> I am glad that you are looking into this problem!
>
> On 30 January 2018 at 01:12, George Karpenkov via cfe-dev
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> Hi All,
>
> I was investigating recently bug reports with very long analyzer
> paths (more than a few hundred nodes).
> In many of such cases the path is long for no good reason: namely,
> the analyzer would go 3 times around the loop before
> going further.
> The issue is surprisingly common, and it was exacerbated with a
> recent bump of analyzer thresholds.
>
> The problem is reproduced on the following file:
>
> ```
> extern int coin();
>
> int foo() {
> int *x = 0;
> while (coin()) {
> if (coin())
> return *x;
> }
> return 0;
> }
>
> void bar() {
> while(coin())
> if (coin())
> foo();
> }
> ```
>
> While a shortest path to the error does not loop around, the
> current version of the analyzer
> will go around the loop three times before going further.
> (and we are quite fortunate that the unrolling limit for loops is
> three, otherwise it would keep going
> until the unrolling limit is reached).
>
> Multiple issues were discovered during the investigation.
>
> 1. Analyzer queue does not have a concept of priority, and
> performs a simple DFS by default.
> Thus if the successor of the if-branch under the loop in “bar"
> containing the desired destination is generated second,
> it will never be evaluated until the loop exploration limit is
> exhausted.
>
> 2. The previous issue slows down the exploration, but is not
> enough to get a pathological behavior of ultra-long paths.
> The second problem is a combination of:
> a) Block counter is not a part of a node's identity, and node A
> with a small block counter can be merged into a node B with a
> large block counter,
> and the resulting node will have a block counter associated with B.
>
>
> Sorry for the questions, just wanted to clarify some things. You mean
> ExplodedNodes? By merge, you mean the same thing as "caching-out"?
Yeah, George notices that if we construct the same ExplodedNode on two
different paths that have different block counts, we'd cache-out on the
latter path, while the worklist element of the first path would still
possess the original block count.
Which happens a lot when we're evaluating foo() conservatively in this
example.
This isn't directly related to our problem though, as i noticed in
http://lists.llvm.org/pipermail/cfe-dev/2018-January/056719.html .
> b) The issue in (a) is triggered due to our heuristic to abandon
> the function’s exploration and switch to conservative evaluation
> if we are already *inside* the function and the block limit has
> been reached.
>
> Issue (1) combined with (2-b) causes the problematic behavior: the
> issue is discovered on the longest path first,
> and by the time the shortest path gets to “bar”, the block limit
> is already reached, and the switch to conservative evaluation is
> performed.
>
> Thus there are two mitigation strategies currently being evaluated:
>
> i) Remove the heuristic in (2-b)
> ii) Use a priority queue to hold nodes which should be explored;
> prefer nodes which give new source code coverage over others
> (or alternatively prefer nodes with least depth of loop stack)
>
> Me and Artem have evaluated the option (i) and the results were
> surprisingly good: some reports disappear, and slightly more
> reports reappear.
> The quality of the new reports seems to be slightly better, and I
> am still trying to figure out exact reasons.
> I suspect merges resulting from heuristic (2-b) cause us to lose
> some actually valid reports.
>
>
> I also find the results surprising. If you have more information about
> the reasons please do not forget to follow up this thread. We are
> curious :)
>
>
> Option (ii) has not been evaluated fully yet, but current
> experiments show slightly more reports (5-10%), and a radical
> decline in report lengths
> (e.g. from 400+ to <100 for largest reports)
>
> Are there any thoughts on the matter?
>
> Personally I think we should do both (i) and (ii), even if they
> would shake up the results.
> - The original idea for heuristics (2-b) was to be able to produce
> a report even if we are out of budget, but since it actually
> results in less reports,
> I think the data does not validate the approach.
>
> - Option (ii) is AFAIK how most similar engines work, and should
> get us much larger coverage (and shorter paths) for the same node
> budget,
> even at the cost of log(N) overhead of the priority queue.
> Moreover, not having the priority queue will bite us later if we
> ever decide to further
> increase the analyzer budget or to increase the unroll limit.
>
>
> I wonder what will the performance implication be. But I also like the
> idea of having a priority queue. If we find that we get more and
> better report
> but also have worse performance, we can also consider reducing the
> analysis budget slightly.
>
> Regards,
> Gábor
>
>
> George
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list