[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Sun Jun 9 17:50:04 PDT 2019

I gave this some thought, and realized that we could divide the
"should-not-have-happened" cases into 2 categories:

1. The unexplored block is within a function that the analyzer visited.
2. The unexplored block lies within a function that the analyzer didn't
even visit.

For example:

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  if (flag) // assumed false

10    i = new int;

11 }

12

13 int main() {

14   int *x = 0; // x initialized to 0

15   flag = 1;

16   foo(); // no note, but there should be one

17   bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }

Observe the subtle difference that on line 17, function bar is called, and
the first branch lies in that function. This is definitely a "category 1"
case, since the analyzer inlined and visited bar(), but again, failed the
realize the importance of flag due to not visiting line 10, and makes no
mention of line 16 in the bugreport.

01 int flag;

02 bool coin();

03

04 void foo() {

05   flag = coin(); // no note, but there should be one

06 }

07

08 void bar(int &i) {

09  i = new int;

10 }

11

12 int main() {

13   int *x = 0; // x initialized to 0

14   flag = 1;

15   foo(); // no note, but there should be one

16 if (flag) // assumed false

17    bar(x);

18   foo(); // no note, but there should be one

19

20   if (flag) // assumed true

21     *x = 5; // warn

22 }

In this example, the if statement stays, but the assignment to x is now
found in bar(). In this case, the analyzer didn't explore bar(), so it's a
"category 2" case.

Now, the reason why these categories are important is that if the analyzer
actually explored a function, it solves the problem of parameters: for the
example shown for "category 1", we can simply ask the analyzer whether x in
function main() is the same as i in function bar(). This is, as stated in
my previous letter, far more difficult for "category 2" cases.

I think when I'm finished with the "must-have-happened" cases, I could
start enhancing NoStoreFuncVisitor to gather conditions possibly worth
tracking for "category 1" cases, see whether an intraprocedural slicing
algorithm makes sense, could worry about resolving parameters for the other
case one when I get there.

Any thoughts on this approach? :)

On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com> wrote:

> Hi!
>
> I'm currently working on a GSoC project that aims to improve the
> description we provide for bug reports the Static Analyzer emits.
>
> We divided the problem (that the bugreports didn't explain how the bug
> came about very well) into two parts:
>
> 1. The information is available in the bug path (basically parts of the
> code the analyzer actually explored on that particular path of execution).
> We refer to these as the "must-have-happened" case.
> 2. The information isn't available, possibly not even in the ExplodedGraph
> (parts of the program the analyzer explored on all paths of execution). We
> call this the "should-not-have-happened" case.
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05   flag = coin(); // no note, but there should be one
>
> 06 }
>
> 07
>
> 08 int main() {
>
> 09   int *x = 0; // x initialized to 0
>
> 10   flag = 1;
>
> 11   foo(); // no note, but there should be one
>
> 12   if (flag) // assumed false
>
> 13     x = new int;
>
> 14   foo(); // no note, but there should be one
>
> 15
>
> 16   if (flag) // assumed true
>
> 17     *x = 5; // warn
>
> 18 }
>
> The first branch is a good example for the "should-not-have-happened"
> case: the bug path doesn't contain line 13, so we don't really know whether
> the condition on line 12 is a big deal, which in this case it is (unlike if
> it only guarded a print of some sort)
>
> The second branch is a good example for the "must-have-happened" case: we
> know that the condition on line 16 is very important.
>
> I like to think that I made a very decent progress on the first part, as
> it's objectively a lot simpler. A great addition to the analyzer is that it
> now understands (at least, when my patches land) control dependencies in
> the CFG. With that, we can figure out that flag's value on line 16 is
> crucial, so we track it, resulting in notes being added on line 14, and on
> line 5, explaining that flag's value changed.
>
> However, we are very unsure about how to tackle the second part on the
> project. I gave this a lot of thought, we had a couple meetings in private,
> and I'd like to share where I am with this.
>
> My project proposed to implement a static backward program slicing
> algorithm which would definitely be amazing, but seems to be an unrealistic
> approach. Basically, any slicing algorithm that is interprocedural would
> require a system dependence graph, something we just simply don't have, not
> even for LLVM.
>
> Whenever pointer semantics are in the picture, we have a really hard time:
> it's something that is super difficult to reason about without symbolic
> execution, and it would probably be a good idea not to tackle such cases
> until we come up with something clever (maybe the pset calculation Gábor
> Horváth is working on?). We should, however, being able to handle reference
> parameters.
>
> So then, what's the goal exactly?
>
> We'd like to teach the analyzer that some code blocks (even in other
> functions) could have written the variable that is responsible for the bug
> (x, in the included example), and is very important to the bug report. We
> could use this information to track variables similarly to the
> "must-have-happened" case: track conditions that prevented the write from
> happening.
>
> To the specifics. I think it's reasonable to implement an
> *intraprocedural* slicing algorithm. For the included example, I think we
> should be able to discover the potential write to x on line 9. We don't
> need anything we already don't have to achieve this.
>
> Reasoning across function boundaries could be achieved by inlining
> functions (as I understand it, simply insert the function's CFGBlocks), but
> there are big unknowns in this.
>
> So my question is, do you think such inlining is possible? If not, should
> I just create a primitive variable-to-parameter mapping? Is there something
> else on the table?
>
> Cheers!
> Kristóf Umann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/a216d6cc/attachment.html>