[cfe-dev] [analyzer][GSoC] Problem Statement: Improving BugReporter with static backward program slicing

Tue Apr 2 12:16:25 PDT 2019

Hi!

In this letter, I'd like to describe a particular problem in the Clang
StaticAnalyzer's BugReporter class that I'd like to tackle in a Google
Summer of Code project this summer. I'll show real-world examples on some
of its shortcomings, and propose a potential solution using static backward
program slicing. At last, I'll ask some specific questions. I'd love to
hear any and all feedback on this!

This is a *problem statement, *not a *proposal*. I plan to send out a
formal proposal by Friday (but not later then Saturday), that will contain
more details on both the problem and the solution. I don't introduce myself
or my previous works within the project, that will also be detailed in my
upcoming letter. I also plan to incorporate the feedback I'll receive to
this letter.

*---=== BugReporter constructs bad reports ===---*

*What does the BugReporter do?*

After the Static Analyzer found an error, the BugReporter receives an
ExplodedNode, which, accompanied by its predecessors, contains all the
information needed to reproduce that error. This *bugpath *is then
shortened with a variety of heuristics, such as removing unrelated function
calls, unrelated branches and so on. BugReporter by the end of this process
will construct a PathDiagnostic object for each report, that is, ideally,
minimal.

*Example 1.*

Consider the following code example:

1  // leak.cpp
2  int *global_ptr;
3
4  void save_ext(int storeIt, int *ptr) {
5    if (storeIt)
6      global_ptr = ptr;
7  }
8
9  void test(int b) {
10   int *myptr = new int;
11   save_ext(b, myptr);
12   delete global_ptr;
13 }

It's clear that if test is invoked with b's value set to true, there is no
error in the program. However, should b be false, we'll leak memory.

$ clang -cc1 -analyze -analyzer-checker=core,cplusplus leak.cpp

[image: image.png]

The Static Analyzer is able to catch this error, but fails to mention the
call to save_ext entirely, despite the error only occurring because the
analyzer assumed that storeIt is false. I've also attached the exploded
graph leak.svg that demonstrates this.

*Example 2.*

Consider the following code example:

1  // divbyzero.cpp
2  void f() {
3    int i = 0;
4    (void) (10 / i);
5  }
6
7  void g() { f(); }
8  void h() { g(); }
9  void j() { h(); }
10 void k() { j(); }

Its clear that a call to f will result in a division by zero error.

$ clang -cc1 -analyze -analyzer-checker=core divbyzero.cpp
[image: image.png]
Again, the Static Analyzer is plenty smart enough to catch this, but the
constructed bug report is littered with a lot of useless information -- it
would be enough to only show the body of f, and, optionally where f itself
was called. For the sake of completeness, I've attached divbyzero.svg that
contains the exploded graph for the above code snippet.

The above examples demonstrate that BugReporter sometimes reduces the
bugpath too much or too little.

*---=== Solving the above problem with static backward program slicing
===---*

*What is static backward program slicing?*

A *program slice* consists of the parts of a program that (potentially)
affect the values computed at some point of interest, called the slicing
criterion. *Program slicing* is a decomposition technique that elides
program components not relevant to the slicing criterion (which is a pair
of (statement, set of variables)), creating a program slice[1][2].
*Static *slicing
preserves the meaning of the variable(s) in the slicing criterion for all
possible inputs[1]. *Backward *slices answer the question “what program
components might affect a selected computation?”[1]

While statement-minimal slices are not necessarily unique[3], Weisel
developed a popular algorithm that constructs one. In essence, his
fix-point algorithm constructs sets of *relevant variables* for each edge
in between node *i* and node *j* in a CFG graph, from which he
constructs *relevant
statements*. The fix-point of the relevant statements set is the slice
itself.

One of the characteristic of his algorithm is that the resulting program
slice will be executable. However, our problem doesn't require the code to
be executable, so we could use a more "aggressive" approach that creates a
smaller slice. An improvement to his algorithm is presented in [4].

*How does this relate to BugReporter?*

We can show that using Weiser's algorithm, issues raised in Example 1. and
Example 2. can be improved upon.

For example 1., the statement-minimal program slice with the criterion
(13, <dynamically
allocated int object>) will contain the statements 5 and 6, and for example
2., the statement-minimal program slice with the criterion (4, i) won't
contain anything but statements 3 and 4. For the latter, we can even
improve the algorithm to also contain statement 7, where a call to f is
made.

The analyzer, as stated earlier, gives BugReporter an ExplodedNode, from
which the slicing criterion must be constructed. The statement
corresponding to this node, coupled with the interesting regions the
checker itself marked could be used to construct this slicing criterion.

*Challenges*

While the algorithm Weiser developed along with the improvements made by
others are interprocedural, I would imagine that in implementation, it
would be a challenging step from an intraprocedural prototype.

Several articles also describe pointers, references, and dynamically
allocated regions, as well as gotos and other tricky parts of the language,
but I still expect to see some skeletons falling out of the closet when
implementing this for C++, not only C.

*Drawbacks*

Static slicing, as an algorithm that doesn't know anything about input
values, suffers from the same issues that all static techniques do, meaning
that without heuristics, it'll have to do very rough guesses, possibly
leaving a far too big program slice. However, with the symbolic values the
analyzer holds, this could be improved, turning this into *conditioned
slicing, *as described in [1]. This however is only planned as a followup
work after GSoC.

For this reason, this project would be developed as an alternative approach
to some of the techniques used in BugReporter, as an optional
off-by-default analyzer feature.

*---=== Questions ===---*

What do you think of this approach?
Do you think that implementing this algorithm is achievable, but tough
enough task for GSoC?
Would you prefer to see a general program slicing library, or an
analyzer-specific implementation? Traversing the ExplodedGraph would be far
easier in terms of what I want to achieve, but a more general approach that
traverses the CFG (like llvm::DominatorTree[5]) could be beneficial to more
developers, but possibly at the cost of not being able to improve the
prototype with the symbolic value information the analyzer holds.

*References, links*

[1] Gallagher, Keith, and David Binkley. "Program slicing." *2008 Frontiers
of Software Maintenance*.
[2] Tip, Frank. *A survey of program slicing techniques*. Centrum voor
Wiskunde en Informatica, 1994.
[3] Weiser, Mark. "Program slicing." *Proceedings of the 5th international
conference on Software engineering*. IEEE Press, 1981.
[4] Binkley, David. "Precise executable interprocedural slices." *ACM
Letters on Programming Languages and Systems (LOPLAS)* 2.1-4 (1993): 31-45.
[5] http://llvm.org/doxygen/classllvm_1_1DominatorTree.html

Link to previous GSoC related letter I sent:
http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html

Cheers,
Kristóf Umann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/f36236bf/attachment.html>