[cfe-dev] [GSOC] Improved Loop Execution Modeling in Clang Static Analyzer - Final Report

Tue Aug 29 07:26:15 PDT 2017

Hello everyone,

During the summer I was working on improving the loop modeling in the Clang
Static Analyzer.
NOTE: The full length report, which also describes various implementation
details and results of quantitative evaluation of the implemented features,
can be found here
<https://docs.google.com/document/d/1-kHFeVPVzkR_HbE5_yt72Kp_qIoONlmwGf5kyanWsnI/edit>
.

1. Motivation and aims:
In the current state of the analyzer, it handles loops quite simple. It
unrolls them 4 times by default and then cut the analysis of that path
where the loop would have been unrolled more times. This behavior is
reached by having a counter for every CFG blocks which determines that how
many times that block was already visited. The analyzer checks it for every
encountered block if the number of visits is more than 4 on the simulated
path. If yes, then it aborts the simulation on that path.
The loss in code coverage is one of the problems with this approach to loop
modeling. Specifically, in cases where the loop bound is statically known
to be greater than 4, the analyzer often did not analyze the code following
the loop. Thus, the naive loop handling (described above) could lead to
unchecked code. Here is a small example for that:

void f() {
 int a[6];
 for (int i = 0; i < 6; i++){
  a[i] = i+2;
 }
 //complex body
}

In the above example, because the loop bound is a statically known value,
the analysis will be stopped at the 4th iteration of the loop and the
“complex body” after the loop will not be analyzed.

2. The working process

My work on improving the loop modeling primarily can be divided into 2
tasks:

   1. Loop Unrolling - Work up heuristics and AST patterns in order to find
   specific loops which are worth to be completely unrolled. All patches
   for this work have been reviewed and committed to the clang repository
   which are the following:
      1.

      The initial implementation: D34260 <https://reviews.llvm.org/D34260>
      <https://reviews.llvm.org/D34260>
      2.

      Changed the CFG of the clang compiler to represent loops more
      directly: D35668 <https://reviews.llvm.org/D35668> and D35670
      <https://reviews.llvm.org/D35670>
      3.

      Update the Loop Unrolling feature to use the newly added CFG
      affordances: D35684 <https://reviews.llvm.org/D35684>
      1. Evaluated the option on different C/C++ open source projects, and
      measured performance as well as coverage to choose the best defaults.
      2.

      New heuristics and fixes added in: D36962
      <https://reviews.llvm.org/D36962>, D37103
      <https://reviews.llvm.org/D37103>, and D37181
      <https://reviews.llvm.org/D37181>
      2. Loop Widening - The Loop Unrolling work laid the infrastructure
   for future improvements in loop modeling. In the last part of the summer, I
   started investigating approaches to Loop Widening, where the analyzer
   simulates the execution of an arbitrary number of iterations. There is
   already a solution which reaches this behavior by discarding all of the
   known information. My aim was to give a more precise solution/approach for
   widening. I submitted the following patches for review:
      1.

      The initial patch which only invalidates the possibly changed
      variables but does not handle every possible case. In this scenario if we
      encounter a not handled statement we decide not to widen the loop.
      D36690 <https://reviews.llvm.org/D36690>
      2.

      In order to handle nested loops better, the analyzer tries to
      reanalyze the loops with a widened state which execution was
aborted due to
      visiting the same block too many times on a given path. D37108
      <https://reviews.llvm.org/D37108>

3. Usage
Currently, both of the features are hidden behind a flag. In order to use
the loop unrolling feature, the user has to pass the ‘unroll-loops=true’
config option to the analyzer.
For widening you have to download and apply the patches from the
Phabricator and build the clang with these changes.(D36690
<https://reviews.llvm.org/D36690> and D37108
<https://reviews.llvm.org/D37108>) It is behind the flag 'widen-loops' so
you have to pass the config option ‘widen-loops=true’ to the analyzer in
order to use this feature.

4. Results
The loop unrolling results were summarized and sent to the cfe-dev mailing
list (link <http://lists.llvm.org/pipermail/cfe-dev/2017-August/055210.html>
<http://lists.llvm.org/pipermail/cfe-dev/2017-August/055210.html>). The
feature increased the coverage of the analysis by 2.3% in average on the
investigated projects while the analysis time remained the same (or even
went down). Based on these results I would suggest using the loop unrolling
feature by default.

The widening results and the question about changing the functionality of
the flag ‘loop-widening’ was sent to the cfe-dev as well: link
<http://lists.llvm.org/pipermail/cfe-dev/2017-August/055222.html>. Although
has a small observable coverage loss (2.3% in average comparing to Widen)
it's offset by the number of the false positives not discovered. In
conclusion it would be beneficial to replace the current implementation of
the 'widen-loops' option.

Regards,
Peter Szecsi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170829/473f47fd/attachment.html>