[cfe-dev] [analyzer][GSoC] Problem Statement: Improving BugReporter with static backward program slicing

Tue Apr 2 12:21:22 PDT 2019

Somehow the images and the attached files were left out, please find them
here:

On Tue, 2 Apr 2019 at 21:16, Kristóf Umann <dkszelethus at gmail.com> wrote:

> Hi!
>
> In this letter, I'd like to describe a particular problem in the Clang
> StaticAnalyzer's BugReporter class that I'd like to tackle in a Google
> Summer of Code project this summer. I'll show real-world examples on some
> of its shortcomings, and propose a potential solution using static backward
> program slicing. At last, I'll ask some specific questions. I'd love to
> hear any and all feedback on this!
>
> This is a *problem statement, *not a *proposal*. I plan to send out a
> formal proposal by Friday (but not later then Saturday), that will contain
> more details on both the problem and the solution. I don't introduce myself
> or my previous works within the project, that will also be detailed in my
> upcoming letter. I also plan to incorporate the feedback I'll receive to
> this letter.
>
> *---=== BugReporter constructs bad reports ===---*
>
> *What does the BugReporter do?*
>
> After the Static Analyzer found an error, the BugReporter receives an
> ExplodedNode, which, accompanied by its predecessors, contains all the
> information needed to reproduce that error. This *bugpath *is then
> shortened with a variety of heuristics, such as removing unrelated function
> calls, unrelated branches and so on. BugReporter by the end of this
> process will construct a PathDiagnostic object for each report, that is,
> ideally, minimal.
>
> *Example 1.*
>
> Consider the following code example:
>
> 1  // leak.cpp
> 2  int *global_ptr;
> 3
> 4  void save_ext(int storeIt, int *ptr) {
> 5    if (storeIt)
> 6      global_ptr = ptr;
> 7  }
> 8
> 9  void test(int b) {
> 10   int *myptr = new int;
> 11   save_ext(b, myptr);
> 12   delete global_ptr;
> 13 }
>
> It's clear that if test is invoked with b's value set to true, there is
> no error in the program. However, should b be false, we'll leak memory.
>
> $ clang -cc1 -analyze -analyzer-checker=core,cplusplus leak.cpp
> [image: image.png]
>
> The Static Analyzer is able to catch this error, but fails to mention the
> call to save_ext entirely, despite the error only occurring because the
> analyzer assumed that storeIt is false. I've also attached the exploded
> graph leak.svg that demonstrates this.
>
> *Example 2.*
>
> Consider the following code example:
>
> 1  // divbyzero.cpp
> 2  void f() {
> 3    int i = 0;
> 4    (void) (10 / i);
> 5  }
> 6
> 7  void g() { f(); }
> 8  void h() { g(); }
> 9  void j() { h(); }
> 10 void k() { j(); }
>
> Its clear that a call to f will result in a division by zero error.
>
> $ clang -cc1 -analyze -analyzer-checker=core divbyzero.cpp
> [image: image.png]
> Again, the Static Analyzer is plenty smart enough to catch this, but the
> constructed bug report is littered with a lot of useless information -- it
> would be enough to only show the body of f, and, optionally where f itself
> was called. For the sake of completeness, I've attached divbyzero.svg that
> contains the exploded graph for the above code snippet.
>
> The above examples demonstrate that BugReporter sometimes reduces the
> bugpath too much or too little.
>
> *---=== Solving the above problem with static backward program slicing
> ===---*
>
> *What is static backward program slicing?*
>
> A *program slice* consists of the parts of a program that (potentially)
> affect the values computed at some point of interest, called the slicing
> criterion. *Program slicing* is a decomposition technique that elides
> program components not relevant to the slicing criterion (which is a pair
> of (statement, set of variables)), creating a program slice[1][2].
> *Static *slicing preserves the meaning of the variable(s) in the slicing
> criterion for all possible inputs[1]. *Backward *slices answer the
> question “what program components might affect a selected computation?”[1]
>
> While statement-minimal slices are not necessarily unique[3], Weisel
> developed a popular algorithm that constructs one. In essence, his
> fix-point algorithm constructs sets of *relevant variables* for each edge
> in between node *i* and node *j* in a CFG graph, from which he constructs *relevant
> statements*. The fix-point of the relevant statements set is the slice
> itself.
>
> One of the characteristic of his algorithm is that the resulting program
> slice will be executable. However, our problem doesn't require the code to
> be executable, so we could use a more "aggressive" approach that creates a
> smaller slice. An improvement to his algorithm is presented in [4].
>
> *How does this relate to BugReporter?*
>
> We can show that using Weiser's algorithm, issues raised in Example 1. and
> Example 2. can be improved upon.
>
> For example 1., the statement-minimal program slice with the criterion
> (13, <dynamically allocated int object>) will contain the statements 5
> and 6, and for example 2., the statement-minimal program slice with the
> criterion (4, i) won't contain anything but statements 3 and 4. For the
> latter, we can even improve the algorithm to also contain statement 7,
> where a call to f is made.
>
> The analyzer, as stated earlier, gives BugReporter an ExplodedNode, from
> which the slicing criterion must be constructed. The statement
> corresponding to this node, coupled with the interesting regions the
> checker itself marked could be used to construct this slicing criterion.
>
> *Challenges*
>
> While the algorithm Weiser developed along with the improvements made by
> others are interprocedural, I would imagine that in implementation, it
> would be a challenging step from an intraprocedural prototype.
>
> Several articles also describe pointers, references, and dynamically
> allocated regions, as well as gotos and other tricky parts of the language,
> but I still expect to see some skeletons falling out of the closet when
> implementing this for C++, not only C.
>
> *Drawbacks*
>
> Static slicing, as an algorithm that doesn't know anything about input
> values, suffers from the same issues that all static techniques do, meaning
> that without heuristics, it'll have to do very rough guesses, possibly
> leaving a far too big program slice. However, with the symbolic values the
> analyzer holds, this could be improved, turning this into *conditioned
> slicing, *as described in [1]. This however is only planned as a followup
> work after GSoC.
>
> For this reason, this project would be developed as an alternative
> approach to some of the techniques used in BugReporter, as an optional
> off-by-default analyzer feature.
>
> *---=== Questions ===---*
>
> What do you think of this approach?
> Do you think that implementing this algorithm is achievable, but tough
> enough task for GSoC?
> Would you prefer to see a general program slicing library, or an
> analyzer-specific implementation? Traversing the ExplodedGraph would be
> far easier in terms of what I want to achieve, but a more general approach
> that traverses the CFG (like llvm::DominatorTree[5]) could be beneficial
> to more developers, but possibly at the cost of not being able to improve
> the prototype with the symbolic value information the analyzer holds.
>
> *References, links*
>
> [1] Gallagher, Keith, and David Binkley. "Program slicing." *2008
> Frontiers of Software Maintenance*.
> [2] Tip, Frank. *A survey of program slicing techniques*. Centrum voor
> Wiskunde en Informatica, 1994.
> [3] Weiser, Mark. "Program slicing." *Proceedings of the 5th
> international conference on Software engineering*. IEEE Press, 1981.
> [4] Binkley, David. "Precise executable interprocedural slices." *ACM
> Letters on Programming Languages and Systems (LOPLAS)* 2.1-4 (1993):
> 31-45.
> [5] http://llvm.org/doxygen/classllvm_1_1DominatorTree.html
>
> Link to previous GSoC related letter I sent:
> http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html
>
> Cheers,
> Kristóf Umann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/d974439c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 126147 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/d974439c/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 171712 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/d974439c/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: divbyzero.svg
Type: image/svg+xml
Size: 49561 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/d974439c/attachment.svg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: leak.svg
Type: image/svg+xml
Size: 125198 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190402/d974439c/attachment-0001.svg>