[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Mon Jun 10 11:03:03 PDT 2019

On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com> wrote:

> The problem we're solving is *entirely* about inter-procedural analysis.
> Like, as long as there are no inlined calls on the path (and therefore no
> pruning is being done), then there's no problem at all: the user can easily
> see that the value is untouched by direct observation.
>

I have explained myself poorly. I intend to implement an
*intraprocedural* slicing,
but that would work *interprocedurally* by resolving function calls with
the help of the analyzer. This is important, because for the by-the-book
interprocedural slicing you'd need a system dependence graph. Now, if a
function isn't inlined, we need to do this on our own. I see a couple
unknowns here (how deep do we explore such functions? how to we resolve
non-trivial parameter passing? are we sure that tracking expressions here
is actually meaningful?) but I think tackling simple cases is achievable.

> In this sense i find the lack of note in example 2 completely
> non-problematic (as long as control dependency tracking lands). It's not a
> problem that the user isn't told that bar() could have initialized 'x',
> because the user already knows that bar() is not even called in the first
> place.
>

Would you agree that tracking flag however would be good?

> However, there may be a combination of category 1 and category 2 that is
> much more interesting: what if main() unconditionally calls foo() that
> conditionally calls bar() that unconditionally initializes the value? Eg.:
>
> void bar(int *x) {
>   *x = 1;  // the interesting event that doesn't happen
> }
>
> void foo(int *x) {
>   if (rand())  // control dependency of the interesting event
>     bar(x);
> }
>
> int main() {
>   int x;
>   foo(&x);
>   use(x); // use of uninitialized value
> }
>
> In this case it is essential to highlight the path within foo() so that to
> explain how it fails to call bar().
>
On 09.06.2019 17:50, Kristóf Umann wrote:
>
> I gave this some thought, and realized that we could divide the
> "should-not-have-happened" cases into 2 categories:
>
> 1. The unexplored block is within a function that the analyzer visited.
> 2. The unexplored block lies within a function that the analyzer didn't
> even visit.
>
> For example:
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05   flag = coin(); // no note, but there should be one
>
> 06 }
>
> 07
>
> 08 void bar(int &i) {
>
> 09  if (flag) // assumed false
>
> 10    i = new int;
>
> 11 }
>
> 12
>
> 13 int main() {
>
> 14   int *x = 0; // x initialized to 0
>
> 15   flag = 1;
>
> 16   foo(); // no note, but there should be one
>
> 17   bar(x);
>
> 18   foo(); // no note, but there should be one
>
> 19
>
> 20   if (flag) // assumed true
>
> 21     *x = 5; // warn
>
> 22 }
>
>
> Observe the subtle difference that on line 17, function bar is called, and
> the first branch lies in that function. This is definitely a "category 1"
> case, since the analyzer inlined and visited bar(), but again, failed the
> realize the importance of flag due to not visiting line 10, and makes no
> mention of line 16 in the bugreport.
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05   flag = coin(); // no note, but there should be one
>
> 06 }
>
> 07
>
> 08 void bar(int &i) {
>
> 09  i = new int;
>
> 10 }
>
> 11
>
> 12 int main() {
>
> 13   int *x = 0; // x initialized to 0
>
> 14   flag = 1;
>
> 15   foo(); // no note, but there should be one
>
> 16 if (flag) // assumed false
>
> 17    bar(x);
>
> 18   foo(); // no note, but there should be one
>
> 19
>
> 20   if (flag) // assumed true
>
> 21     *x = 5; // warn
>
> 22 }
>
>
> In this example, the if statement stays, but the assignment to x is now
> found in bar(). In this case, the analyzer didn't explore bar(), so it's
> a "category 2" case.
>
>
> Now, the reason why these categories are important is that if the analyzer
> actually explored a function, it solves the problem of parameters: for the
> example shown for "category 1", we can simply ask the analyzer whether x
> in function main() is the same as i in function bar(). This is, as stated
> in my previous letter, far more difficult for "category 2" cases.
>
>
> I think when I'm finished with the "must-have-happened" cases, I could
> start enhancing NoStoreFuncVisitor to gather conditions possibly worth
> tracking for "category 1" cases, see whether an intraprocedural slicing
> algorithm makes sense, could worry about resolving parameters for the other
> case one when I get there.
>
>
> Any thoughts on this approach? :)
>
> On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com> wrote:
>
>> Hi!
>>
>> I'm currently working on a GSoC project that aims to improve the
>> description we provide for bug reports the Static Analyzer emits.
>>
>> We divided the problem (that the bugreports didn't explain how the bug
>> came about very well) into two parts:
>>
>> 1. The information is available in the bug path (basically parts of the
>> code the analyzer actually explored on that particular path of execution).
>> We refer to these as the "must-have-happened" case.
>> 2. The information isn't available, possibly not even in the
>> ExplodedGraph (parts of the program the analyzer explored on all paths of
>> execution). We call this the "should-not-have-happened" case.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05   flag = coin(); // no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 int main() {
>>
>> 09   int *x = 0; // x initialized to 0
>>
>> 10   flag = 1;
>>
>> 11   foo(); // no note, but there should be one
>>
>> 12   if (flag) // assumed false
>>
>> 13     x = new int;
>>
>> 14   foo(); // no note, but there should be one
>>
>> 15
>>
>> 16   if (flag) // assumed true
>>
>> 17     *x = 5; // warn
>>
>> 18 }
>>
>> The first branch is a good example for the "should-not-have-happened"
>> case: the bug path doesn't contain line 13, so we don't really know whether
>> the condition on line 12 is a big deal, which in this case it is (unlike if
>> it only guarded a print of some sort)
>>
>> The second branch is a good example for the "must-have-happened" case: we
>> know that the condition on line 16 is very important.
>>
>> I like to think that I made a very decent progress on the first part, as
>> it's objectively a lot simpler. A great addition to the analyzer is that it
>> now understands (at least, when my patches land) control dependencies in
>> the CFG. With that, we can figure out that flag's value on line 16 is
>> crucial, so we track it, resulting in notes being added on line 14, and on
>> line 5, explaining that flag's value changed.
>>
>> However, we are very unsure about how to tackle the second part on the
>> project. I gave this a lot of thought, we had a couple meetings in private,
>> and I'd like to share where I am with this.
>>
>> My project proposed to implement a static backward program slicing
>> algorithm which would definitely be amazing, but seems to be an unrealistic
>> approach. Basically, any slicing algorithm that is interprocedural would
>> require a system dependence graph, something we just simply don't have, not
>> even for LLVM.
>>
>> Whenever pointer semantics are in the picture, we have a really hard
>> time: it's something that is super difficult to reason about without
>> symbolic execution, and it would probably be a good idea not to tackle such
>> cases until we come up with something clever (maybe the pset calculation
>> Gábor Horváth is working on?). We should, however, being able to handle
>> reference parameters.
>>
>> So then, what's the goal exactly?
>>
>> We'd like to teach the analyzer that some code blocks (even in other
>> functions) could have written the variable that is responsible for the bug
>> (x, in the included example), and is very important to the bug report. We
>> could use this information to track variables similarly to the
>> "must-have-happened" case: track conditions that prevented the write from
>> happening.
>>
>> To the specifics. I think it's reasonable to implement an
>> *intraprocedural* slicing algorithm. For the included example, I think
>> we should be able to discover the potential write to x on line 9. We don't
>> need anything we already don't have to achieve this.
>>
>> Reasoning across function boundaries could be achieved by inlining
>> functions (as I understand it, simply insert the function's CFGBlocks), but
>> there are big unknowns in this.
>>
>> So my question is, do you think such inlining is possible? If not, should
>> I just create a primitive variable-to-parameter mapping? Is there something
>> else on the table?
>>
>> Cheers!
>> Kristóf Umann
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/064caffd/attachment.html>