[cfe-dev] [analyzer] Discovering information not contained in the bugpath

Kristóf Umann via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 10 18:46:31 PDT 2019


I rewrote my sketch algorithm that showcases how I imagine this to work.

https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit

On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <dkszelethus at gmail.com> wrote:

>
>
> On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com> wrote:
>
>> The problem we're solving is *entirely* about inter-procedural analysis.
>> Like, as long as there are no inlined calls on the path (and therefore no
>> pruning is being done), then there's no problem at all: the user can easily
>> see that the value is untouched by direct observation.
>>
>
> I have explained myself poorly. I intend to implement an *intraprocedural* slicing,
> but that would work *interprocedurally* by resolving function calls with
> the help of the analyzer. This is important, because for the by-the-book
> interprocedural slicing you'd need a system dependence graph. Now, if a
> function isn't inlined, we need to do this on our own. I see a couple
> unknowns here (how deep do we explore such functions? how to we resolve
> non-trivial parameter passing? are we sure that tracking expressions here
> is actually meaningful?) but I think tackling simple cases is achievable.
>
>
>> In this sense i find the lack of note in example 2 completely
>> non-problematic (as long as control dependency tracking lands). It's not a
>> problem that the user isn't told that bar() could have initialized 'x',
>> because the user already knows that bar() is not even called in the first
>> place.
>>
>
> Would you agree that tracking flag however would be good?
>
>
>> However, there may be a combination of category 1 and category 2 that is
>> much more interesting: what if main() unconditionally calls foo() that
>> conditionally calls bar() that unconditionally initializes the value? Eg.:
>>
>> void bar(int *x) {
>>   *x = 1;  // the interesting event that doesn't happen
>> }
>>
>> void foo(int *x) {
>>   if (rand())  // control dependency of the interesting event
>>     bar(x);
>> }
>>
>> int main() {
>>   int x;
>>   foo(&x);
>>   use(x); // use of uninitialized value
>> }
>>
>> In this case it is essential to highlight the path within foo() so that
>> to explain how it fails to call bar().
>>
> On 09.06.2019 17:50, Kristóf Umann wrote:
>>
>> I gave this some thought, and realized that we could divide the
>> "should-not-have-happened" cases into 2 categories:
>>
>> 1. The unexplored block is within a function that the analyzer visited.
>> 2. The unexplored block lies within a function that the analyzer didn't
>> even visit.
>>
>> For example:
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05   flag = coin(); // no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09  if (flag) // assumed false
>>
>> 10    i = new int;
>>
>> 11 }
>>
>> 12
>>
>> 13 int main() {
>>
>> 14   int *x = 0; // x initialized to 0
>>
>> 15   flag = 1;
>>
>> 16   foo(); // no note, but there should be one
>>
>> 17   bar(x);
>>
>> 18   foo(); // no note, but there should be one
>>
>> 19
>>
>> 20   if (flag) // assumed true
>>
>> 21     *x = 5; // warn
>>
>> 22 }
>>
>>
>> Observe the subtle difference that on line 17, function bar is called,
>> and the first branch lies in that function. This is definitely a "category
>> 1" case, since the analyzer inlined and visited bar(), but again, failed
>> the realize the importance of flag due to not visiting line 10, and makes
>> no mention of line 16 in the bugreport.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05   flag = coin(); // no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09  i = new int;
>>
>> 10 }
>>
>> 11
>>
>> 12 int main() {
>>
>> 13   int *x = 0; // x initialized to 0
>>
>> 14   flag = 1;
>>
>> 15   foo(); // no note, but there should be one
>>
>> 16 if (flag) // assumed false
>>
>> 17    bar(x);
>>
>> 18   foo(); // no note, but there should be one
>>
>> 19
>>
>> 20   if (flag) // assumed true
>>
>> 21     *x = 5; // warn
>>
>> 22 }
>>
>>
>> In this example, the if statement stays, but the assignment to x is now
>> found in bar(). In this case, the analyzer didn't explore bar(), so it's
>> a "category 2" case.
>>
>>
>> Now, the reason why these categories are important is that if the
>> analyzer actually explored a function, it solves the problem of parameters:
>> for the example shown for "category 1", we can simply ask the analyzer
>> whether x in function main() is the same as i in function bar(). This
>> is, as stated in my previous letter, far more difficult for "category 2"
>> cases.
>>
>>
>> I think when I'm finished with the "must-have-happened" cases, I could
>> start enhancing NoStoreFuncVisitor to gather conditions possibly worth
>> tracking for "category 1" cases, see whether an intraprocedural slicing
>> algorithm makes sense, could worry about resolving parameters for the other
>> case one when I get there.
>>
>>
>> Any thoughts on this approach? :)
>>
>> On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I'm currently working on a GSoC project that aims to improve the
>>> description we provide for bug reports the Static Analyzer emits.
>>>
>>> We divided the problem (that the bugreports didn't explain how the bug
>>> came about very well) into two parts:
>>>
>>> 1. The information is available in the bug path (basically parts of the
>>> code the analyzer actually explored on that particular path of execution).
>>> We refer to these as the "must-have-happened" case.
>>> 2. The information isn't available, possibly not even in the
>>> ExplodedGraph (parts of the program the analyzer explored on all paths of
>>> execution). We call this the "should-not-have-happened" case.
>>>
>>> 01 int flag;
>>>
>>> 02 bool coin();
>>>
>>> 03
>>>
>>> 04 void foo() {
>>>
>>> 05   flag = coin(); // no note, but there should be one
>>>
>>> 06 }
>>>
>>> 07
>>>
>>> 08 int main() {
>>>
>>> 09   int *x = 0; // x initialized to 0
>>>
>>> 10   flag = 1;
>>>
>>> 11   foo(); // no note, but there should be one
>>>
>>> 12   if (flag) // assumed false
>>>
>>> 13     x = new int;
>>>
>>> 14   foo(); // no note, but there should be one
>>>
>>> 15
>>>
>>> 16   if (flag) // assumed true
>>>
>>> 17     *x = 5; // warn
>>>
>>> 18 }
>>>
>>> The first branch is a good example for the "should-not-have-happened"
>>> case: the bug path doesn't contain line 13, so we don't really know whether
>>> the condition on line 12 is a big deal, which in this case it is (unlike if
>>> it only guarded a print of some sort)
>>>
>>> The second branch is a good example for the "must-have-happened" case:
>>> we know that the condition on line 16 is very important.
>>>
>>> I like to think that I made a very decent progress on the first part, as
>>> it's objectively a lot simpler. A great addition to the analyzer is that it
>>> now understands (at least, when my patches land) control dependencies in
>>> the CFG. With that, we can figure out that flag's value on line 16 is
>>> crucial, so we track it, resulting in notes being added on line 14, and on
>>> line 5, explaining that flag's value changed.
>>>
>>> However, we are very unsure about how to tackle the second part on the
>>> project. I gave this a lot of thought, we had a couple meetings in private,
>>> and I'd like to share where I am with this.
>>>
>>> My project proposed to implement a static backward program slicing
>>> algorithm which would definitely be amazing, but seems to be an unrealistic
>>> approach. Basically, any slicing algorithm that is interprocedural would
>>> require a system dependence graph, something we just simply don't have, not
>>> even for LLVM.
>>>
>>> Whenever pointer semantics are in the picture, we have a really hard
>>> time: it's something that is super difficult to reason about without
>>> symbolic execution, and it would probably be a good idea not to tackle such
>>> cases until we come up with something clever (maybe the pset calculation
>>> Gábor Horváth is working on?). We should, however, being able to handle
>>> reference parameters.
>>>
>>> So then, what's the goal exactly?
>>>
>>> We'd like to teach the analyzer that some code blocks (even in other
>>> functions) could have written the variable that is responsible for the bug
>>> (x, in the included example), and is very important to the bug report. We
>>> could use this information to track variables similarly to the
>>> "must-have-happened" case: track conditions that prevented the write from
>>> happening.
>>>
>>> To the specifics. I think it's reasonable to implement an
>>> *intraprocedural* slicing algorithm. For the included example, I think
>>> we should be able to discover the potential write to x on line 9. We don't
>>> need anything we already don't have to achieve this.
>>>
>>> Reasoning across function boundaries could be achieved by inlining
>>> functions (as I understand it, simply insert the function's CFGBlocks), but
>>> there are big unknowns in this.
>>>
>>> So my question is, do you think such inlining is possible? If not,
>>> should I just create a primitive variable-to-parameter mapping? Is there
>>> something else on the table?
>>>
>>> Cheers!
>>> Kristóf Umann
>>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190611/6123941f/attachment.html>


More information about the cfe-dev mailing list