[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Fri Jun 7 16:59:30 PDT 2019

Hi!

I'm currently working on a GSoC project that aims to improve the
description we provide for bug reports the Static Analyzer emits.

We divided the problem (that the bugreports didn't explain how the bug came
about very well) into two parts:

1. The information is available in the bug path (basically parts of the
code the analyzer actually explored on that particular path of execution).
We refer to these as the "must-have-happened" case.
2. The information isn't available, possibly not even in the ExplodedGraph
(parts of the program the analyzer explored on all paths of execution). We
call this the "should-not-have-happened" case.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 int main() {

09   int *x = 0; // x initialized to 0

10   flag = 1;

11   foo(); // no note, but there should be one

12   if (flag) // assumed false

13     x = new int;

14   foo(); // no note, but there should be one

15

16   if (flag) // assumed true

17     *x = 5; // warn

18 }

The first branch is a good example for the "should-not-have-happened" case:
the bug path doesn't contain line 13, so we don't really know whether the
condition on line 12 is a big deal, which in this case it is (unlike if it
only guarded a print of some sort)

The second branch is a good example for the "must-have-happened" case: we
know that the condition on line 16 is very important.

I like to think that I made a very decent progress on the first part, as
it's objectively a lot simpler. A great addition to the analyzer is that it
now understands (at least, when my patches land) control dependencies in
the CFG. With that, we can figure out that flag's value on line 16 is
crucial, so we track it, resulting in notes being added on line 14, and on
line 5, explaining that flag's value changed.

However, we are very unsure about how to tackle the second part on the
project. I gave this a lot of thought, we had a couple meetings in private,
and I'd like to share where I am with this.

My project proposed to implement a static backward program slicing
algorithm which would definitely be amazing, but seems to be an unrealistic
approach. Basically, any slicing algorithm that is interprocedural would
require a system dependence graph, something we just simply don't have, not
even for LLVM.

Whenever pointer semantics are in the picture, we have a really hard time:
it's something that is super difficult to reason about without symbolic
execution, and it would probably be a good idea not to tackle such cases
until we come up with something clever (maybe the pset calculation Gábor
Horváth is working on?). We should, however, being able to handle reference
parameters.

So then, what's the goal exactly?

We'd like to teach the analyzer that some code blocks (even in other
functions) could have written the variable that is responsible for the bug
(x, in the included example), and is very important to the bug report. We
could use this information to track variables similarly to the
"must-have-happened" case: track conditions that prevented the write from
happening.

To the specifics. I think it's reasonable to implement an
*intraprocedural* slicing
algorithm. For the included example, I think we should be able to discover
the potential write to x on line 9. We don't need anything we already don't
have to achieve this.

Reasoning across function boundaries could be achieved by inlining
functions (as I understand it, simply insert the function's CFGBlocks), but
there are big unknowns in this.

So my question is, do you think such inlining is possible? If not, should I
just create a primitive variable-to-parameter mapping? Is there something
else on the table?

Cheers!
Kristóf Umann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190608/3bb53e4c/attachment.html>