[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Mon Jun 10 19:18:58 PDT 2019

Hmm, what do you mean by this line?:

         trackExpressionValue(N, RHS of assignment, Report)

Like, how does the existing visitor infrastructure behave when the 
statement is, by construction, never appears in the bug path? It doesn't 
even have a value to track, as it never appeared in the Environment.

P.S. I just realized that this discussion would probably be messy to 
read later through archives, as old revisions of the google doc are no 
longer available.

On 6/10/19 6:46 PM, Kristóf Umann wrote:
> I rewrote my sketch algorithm that showcases how I imagine this to work.
>
> https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit
>
> On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <dkszelethus at gmail.com 
> <mailto:dkszelethus at gmail.com>> wrote:
>
>
>
>     On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com
>     <mailto:noqnoqneo at gmail.com>> wrote:
>
>         The problem we're solving is *entirely* about inter-procedural
>         analysis. Like, as long as there are no inlined calls on the
>         path (and therefore no pruning is being done), then there's no
>         problem at all: the user can easily see that the value is
>         untouched by direct observation.
>
>
>     I have explained myself poorly. I intend to implement an
>     /intraprocedural/ slicing, but that would work
>     /interprocedurally/ by resolving function calls with the help of
>     the analyzer. This is important, because for the by-the-book
>     interprocedural slicing you'd need a system dependence graph. Now,
>     if a function isn't inlined, we need to do this on our own. I see
>     a couple unknowns here (how deep do we explore such functions? how
>     to we resolve non-trivial parameter passing? are we sure that
>     tracking expressions here is actually meaningful?) but I think
>     tackling simple cases is achievable.
>
>         In this sense i find the lack of note in example 2 completely
>         non-problematic (as long as control dependency tracking
>         lands). It's not a problem that the user isn't told that bar()
>         could have initialized 'x', because the user already knows
>         that bar() is not even called in the first place.
>
>
>     Would you agree that tracking flag however would be good?
>
>         However, there may be a combination of category 1 and category
>         2 that is much more interesting: what if main()
>         unconditionally calls foo() that conditionally calls bar()
>         that unconditionally initializes the value? Eg.:
>
>         void bar(int *x) {
>           *x = 1;  // the interesting event that doesn't happen
>         }
>
>         void foo(int *x) {
>           if (rand())  // control dependency of the interesting event
>             bar(x);
>         }
>
>         int main() {
>           int x;
>           foo(&x);
>           use(x); // use of uninitialized value
>         }
>
>         In this case it is essential to highlight the path within
>         foo() so that to explain how it fails to call bar().
>
>         On 09.06.2019 17:50, Kristóf Umann wrote:
>>         I gave this some thought, and realized that we could divide
>>         the "should-not-have-happened" cases into 2 categories:
>>
>>         1. The unexplored block is within a function that the
>>         analyzer visited.
>>         2. The unexplored block lies within a function that the
>>         analyzer didn't even visit.
>>
>>         For example:
>>
>>         01 int flag;
>>
>>         02 bool coin();
>>
>>         03
>>
>>         04 void foo() {
>>
>>         05   flag = coin();// no note, but there should be one
>>
>>         06 }
>>
>>         07
>>
>>         08 void bar(int &i) {
>>
>>         09  if (flag) // assumed false
>>
>>         10    i = new int;
>>
>>         11 }
>>
>>         12
>>
>>         13 int main() {
>>
>>         14   int *x = 0; // x initialized to 0
>>
>>         15   flag = 1;
>>
>>         16   foo();// no note, but there should be one
>>
>>         17   bar(x);
>>
>>         18   foo(); // no note, but there should be one
>>
>>         19
>>
>>         20   if (flag) // assumed true
>>
>>         21     *x = 5; // warn
>>
>>         22 }
>>
>>
>>         Observe the subtle difference that on line 17, function bar
>>         is called, and the first branch lies in that function. This
>>         is definitely a "category 1" case, since the analyzer inlined
>>         and visited bar(), but again, failed the realize the
>>         importance of flag due to not visiting line 10, and makes no
>>         mention of line 16 in the bugreport.
>>
>>         01 int flag;
>>
>>         02 bool coin();
>>
>>         03
>>
>>         04 void foo() {
>>
>>         05   flag = coin();// no note, but there should be one
>>
>>         06 }
>>
>>         07
>>
>>         08 void bar(int &i) {
>>
>>         09  i = new int;
>>
>>         10 }
>>
>>         11
>>
>>         12 int main() {
>>
>>         13   int *x = 0; // x initialized to 0
>>
>>         14   flag = 1;
>>
>>         15   foo();// no note, but there should be one
>>
>>         16 if (flag) // assumed false
>>
>>         17    bar(x);
>>
>>         18   foo(); // no note, but there should be one
>>
>>         19
>>
>>         20   if (flag) // assumed true
>>
>>         21     *x = 5; // warn
>>
>>         22 }
>>
>>
>>         In this example, the if statement stays, but the assignment
>>         to x is now found in bar(). In this case, the analyzer didn't
>>         explore bar(), so it's a "category 2" case.
>>
>>
>>         Now, the reason why these categories are important is that if
>>         the analyzer actually explored a function, it solves the
>>         problem of parameters: for the example shown for "category
>>         1", we can simply ask the analyzer whether x in function
>>         main() is the same as i in function bar(). This is, as stated
>>         in my previous letter, far more difficult for "category 2" cases.
>>
>>
>>         I think when I'm finished with the "must-have-happened"
>>         cases, I could start enhancing NoStoreFuncVisitor to gather
>>         conditions possibly worth tracking for "category 1" cases,
>>         see whether an intraprocedural slicing algorithm makes sense,
>>         could worry about resolving parameters for the other case one
>>         when I get there.
>>
>>
>>         Any thoughts on this approach? :)
>>
>>
>>         On Sat, 8 Jun 2019 at 01:59, Kristóf Umann
>>         <dkszelethus at gmail.com <mailto:dkszelethus at gmail.com>> wrote:
>>
>>             Hi!
>>
>>             I'm currently working on a GSoC project that aims to
>>             improve the description we provide for bug reports the
>>             Static Analyzer emits.
>>
>>             We divided the problem (that the bugreports didn't
>>             explain how the bug came about very well) into two parts:
>>
>>             1. The information is available in the bug path
>>             (basically parts of the code the analyzer actually
>>             explored on that particular path of execution). We refer
>>             to these as the "must-have-happened" case.
>>             2. The information isn't available, possibly not even in
>>             the ExplodedGraph (parts of the program the analyzer
>>             explored on all paths of execution). We call this the
>>             "should-not-have-happened" case.
>>
>>             01 int flag;
>>
>>             02 bool coin();
>>
>>             03
>>
>>             04 void foo() {
>>
>>             05   flag = coin();// no note, but there should be one
>>
>>             06 }
>>
>>             07
>>
>>             08 int main() {
>>
>>             09   int *x = 0; // x initialized to 0
>>
>>             10   flag = 1;
>>
>>             11   foo();// no note, but there should be one
>>
>>             12   if (flag) // assumed false
>>
>>             13     x = new int;
>>
>>             14   foo(); // no note, but there should be one
>>
>>             15
>>
>>             16   if (flag) // assumed true
>>
>>             17     *x = 5; // warn
>>
>>             18 }
>>
>>
>>             The first branch is a good example for the
>>             "should-not-have-happened" case: the bug path doesn't
>>             contain line 13, so we don't really know whether the
>>             condition on line 12 is a big deal, which in this case it
>>             is (unlike if it only guarded a print of some sort)
>>
>>             The second branch is a good example for the
>>             "must-have-happened" case: we know that the condition on
>>             line 16 is very important.
>>
>>             I like to think that I made a very decent progress on the
>>             first part, as it's objectively a lot simpler. A great
>>             addition to the analyzer is that it now understands (at
>>             least, when my patches land) control dependencies in the
>>             CFG. With that, we can figure out that flag's value on
>>             line 16 is crucial, so we track it, resulting in notes
>>             being added on line 14, and on line 5, explaining that
>>             flag's value changed.
>>
>>             However, we are very unsure about how to tackle the
>>             second part on the project. I gave this a lot of thought,
>>             we had a couple meetings in private, and I'd like to
>>             share where I am with this.
>>
>>             My project proposed to implement a static backward
>>             program slicing algorithm which would definitely be
>>             amazing, but seems to be an unrealistic approach.
>>             Basically, any slicing algorithm that is interprocedural
>>             would require a system dependence graph, something we
>>             just simply don't have, not even for LLVM.
>>
>>             Whenever pointer semantics are in the picture, we have a
>>             really hard time: it's something that is super difficult
>>             to reason about without symbolic execution, and it would
>>             probably be a good idea not to tackle such cases until we
>>             come up with something clever (maybe the pset calculation
>>             Gábor Horváth is working on?). We should, however, being
>>             able to handle reference parameters.
>>
>>             So then, what's the goal exactly?
>>
>>             We'd like to teach the analyzer that some code blocks
>>             (even in other functions) could have written the variable
>>             that is responsible for the bug (x, in the included
>>             example), and is very important to the bug report. We
>>             could use this information to track variables similarly
>>             to the "must-have-happened" case: track conditions that
>>             prevented the write from happening.
>>
>>             To the specifics. I think it's reasonable to implement an
>>             /intraprocedural/ slicing algorithm. For the included
>>             example, I think we should be able to discover the
>>             potential write to x on line 9. We don't need anything we
>>             already don't have to achieve this.
>>
>>             Reasoning across function boundaries could be achieved by
>>             inlining functions (as I understand it, simply insert the
>>             function's CFGBlocks), but there are big unknowns in this.
>>
>>             So my question is, do you think such inlining is
>>             possible? If not, should I just create a primitive
>>             variable-to-parameter mapping? Is there something else on
>>             the table?
>>
>>             Cheers!
>>             Kristóf Umann
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/0aa94d97/attachment.html>