[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 10 19:16:09 PDT 2019



On 6/10/19 11:03 AM, Kristóf Umann wrote:
>
>
> On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>     The problem we're solving is *entirely* about inter-procedural
>     analysis. Like, as long as there are no inlined calls on the path
>     (and therefore no pruning is being done), then there's no problem
>     at all: the user can easily see that the value is untouched by
>     direct observation.
>
>
> I have explained myself poorly. I intend to implement an 
> /intraprocedural/ slicing, but that would work /interprocedurally/ by 
> resolving function calls with the help of the analyzer. This is 
> important, because for the by-the-book interprocedural slicing you'd 
> need a system dependence graph. Now, if a function isn't inlined, we 
> need to do this on our own. I see a couple unknowns here (how deep do 
> we explore such functions? how to we resolve non-trivial parameter 
> passing? are we sure that tracking expressions here is actually 
> meaningful?) but I think tackling simple cases is achievable.
>
>     In this sense i find the lack of note in example 2 completely
>     non-problematic (as long as control dependency tracking lands).
>     It's not a problem that the user isn't told that bar() could have
>     initialized 'x', because the user already knows that bar() is not
>     even called in the first place.
>
>
> Would you agree that tracking flag however would be good?

Yes, totally.

>     However, there may be a combination of category 1 and category 2
>     that is much more interesting: what if main() unconditionally
>     calls foo() that conditionally calls bar() that unconditionally
>     initializes the value? Eg.:
>
>     void bar(int *x) {
>       *x = 1;  // the interesting event that doesn't happen
>     }
>
>     void foo(int *x) {
>       if (rand())  // control dependency of the interesting event
>         bar(x);
>     }
>
>     int main() {
>       int x;
>       foo(&x);
>       use(x); // use of uninitialized value
>     }
>
>     In this case it is essential to highlight the path within foo() so
>     that to explain how it fails to call bar().
>
>     On 09.06.2019 17:50, Kristóf Umann wrote:
>>     I gave this some thought, and realized that we could divide the
>>     "should-not-have-happened" cases into 2 categories:
>>
>>     1. The unexplored block is within a function that the analyzer
>>     visited.
>>     2. The unexplored block lies within a function that the analyzer
>>     didn't even visit.
>>
>>     For example:
>>
>>     01 int flag;
>>
>>     02 bool coin();
>>
>>     03
>>
>>     04 void foo() {
>>
>>     05   flag = coin();// no note, but there should be one
>>
>>     06 }
>>
>>     07
>>
>>     08 void bar(int &i) {
>>
>>     09  if (flag) // assumed false
>>
>>     10    i = new int;
>>
>>     11 }
>>
>>     12
>>
>>     13 int main() {
>>
>>     14   int *x = 0; // x initialized to 0
>>
>>     15   flag = 1;
>>
>>     16   foo();// no note, but there should be one
>>
>>     17   bar(x);
>>
>>     18   foo(); // no note, but there should be one
>>
>>     19
>>
>>     20   if (flag) // assumed true
>>
>>     21     *x = 5; // warn
>>
>>     22 }
>>
>>
>>     Observe the subtle difference that on line 17, function bar is
>>     called, and the first branch lies in that function. This is
>>     definitely a "category 1" case, since the analyzer inlined and
>>     visited bar(), but again, failed the realize the importance of
>>     flag due to not visiting line 10, and makes no mention of line 16
>>     in the bugreport.
>>
>>     01 int flag;
>>
>>     02 bool coin();
>>
>>     03
>>
>>     04 void foo() {
>>
>>     05   flag = coin();// no note, but there should be one
>>
>>     06 }
>>
>>     07
>>
>>     08 void bar(int &i) {
>>
>>     09  i = new int;
>>
>>     10 }
>>
>>     11
>>
>>     12 int main() {
>>
>>     13   int *x = 0; // x initialized to 0
>>
>>     14   flag = 1;
>>
>>     15   foo();// no note, but there should be one
>>
>>     16 if (flag) // assumed false
>>
>>     17    bar(x);
>>
>>     18   foo(); // no note, but there should be one
>>
>>     19
>>
>>     20   if (flag) // assumed true
>>
>>     21     *x = 5; // warn
>>
>>     22 }
>>
>>
>>     In this example, the if statement stays, but the assignment to x
>>     is now found in bar(). In this case, the analyzer didn't explore
>>     bar(), so it's a "category 2" case.
>>
>>
>>     Now, the reason why these categories are important is that if the
>>     analyzer actually explored a function, it solves the problem of
>>     parameters: for the example shown for "category 1", we can simply
>>     ask the analyzer whether x in function main() is the same as i in
>>     function bar(). This is, as stated in my previous letter, far
>>     more difficult for "category 2" cases.
>>
>>
>>     I think when I'm finished with the "must-have-happened" cases, I
>>     could start enhancing NoStoreFuncVisitor to gather conditions
>>     possibly worth tracking for "category 1" cases, see whether an
>>     intraprocedural slicing algorithm makes sense, could worry about
>>     resolving parameters for the other case one when I get there.
>>
>>
>>     Any thoughts on this approach? :)
>>
>>
>>     On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com
>>     <mailto:dkszelethus at gmail.com>> wrote:
>>
>>         Hi!
>>
>>         I'm currently working on a GSoC project that aims to improve
>>         the description we provide for bug reports the Static
>>         Analyzer emits.
>>
>>         We divided the problem (that the bugreports didn't explain
>>         how the bug came about very well) into two parts:
>>
>>         1. The information is available in the bug path (basically
>>         parts of the code the analyzer actually explored on that
>>         particular path of execution). We refer to these as the
>>         "must-have-happened" case.
>>         2. The information isn't available, possibly not even in the
>>         ExplodedGraph (parts of the program the analyzer explored on
>>         all paths of execution). We call this the
>>         "should-not-have-happened" case.
>>
>>         01 int flag;
>>
>>         02 bool coin();
>>
>>         03
>>
>>         04 void foo() {
>>
>>         05   flag = coin();// no note, but there should be one
>>
>>         06 }
>>
>>         07
>>
>>         08 int main() {
>>
>>         09   int *x = 0; // x initialized to 0
>>
>>         10   flag = 1;
>>
>>         11   foo();// no note, but there should be one
>>
>>         12   if (flag) // assumed false
>>
>>         13     x = new int;
>>
>>         14   foo(); // no note, but there should be one
>>
>>         15
>>
>>         16   if (flag) // assumed true
>>
>>         17     *x = 5; // warn
>>
>>         18 }
>>
>>
>>         The first branch is a good example for the
>>         "should-not-have-happened" case: the bug path doesn't contain
>>         line 13, so we don't really know whether the condition on
>>         line 12 is a big deal, which in this case it is (unlike if it
>>         only guarded a print of some sort)
>>
>>         The second branch is a good example for the
>>         "must-have-happened" case: we know that the condition on line
>>         16 is very important.
>>
>>         I like to think that I made a very decent progress on the
>>         first part, as it's objectively a lot simpler. A great
>>         addition to the analyzer is that it now understands (at
>>         least, when my patches land) control dependencies in the CFG.
>>         With that, we can figure out that flag's value on line 16 is
>>         crucial, so we track it, resulting in notes being added on
>>         line 14, and on line 5, explaining that flag's value changed.
>>
>>         However, we are very unsure about how to tackle the second
>>         part on the project. I gave this a lot of thought, we had a
>>         couple meetings in private, and I'd like to share where I am
>>         with this.
>>
>>         My project proposed to implement a static backward program
>>         slicing algorithm which would definitely be amazing, but
>>         seems to be an unrealistic approach. Basically, any slicing
>>         algorithm that is interprocedural would require a system
>>         dependence graph, something we just simply don't have, not
>>         even for LLVM.
>>
>>         Whenever pointer semantics are in the picture, we have a
>>         really hard time: it's something that is super difficult to
>>         reason about without symbolic execution, and it would
>>         probably be a good idea not to tackle such cases until we
>>         come up with something clever (maybe the pset calculation
>>         Gábor Horváth is working on?). We should, however, being able
>>         to handle reference parameters.
>>
>>         So then, what's the goal exactly?
>>
>>         We'd like to teach the analyzer that some code blocks (even
>>         in other functions) could have written the variable that is
>>         responsible for the bug (x, in the included example), and is
>>         very important to the bug report. We could use this
>>         information to track variables similarly to the
>>         "must-have-happened" case: track conditions that prevented
>>         the write from happening.
>>
>>         To the specifics. I think it's reasonable to implement an
>>         /intraprocedural/ slicing algorithm. For the included
>>         example, I think we should be able to discover the potential
>>         write to x on line 9. We don't need anything we already don't
>>         have to achieve this.
>>
>>         Reasoning across function boundaries could be achieved by
>>         inlining functions (as I understand it, simply insert the
>>         function's CFGBlocks), but there are big unknowns in this.
>>
>>         So my question is, do you think such inlining is possible? If
>>         not, should I just create a primitive variable-to-parameter
>>         mapping? Is there something else on the table?
>>
>>         Cheers!
>>         Kristóf Umann
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/8c3bd867/attachment.html>


More information about the cfe-dev mailing list