[cfe-dev] [analyzer] Discovering information not contained in the bugpath
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Jun 10 19:16:09 PDT 2019
On 6/10/19 11:03 AM, Kristóf Umann wrote:
>
>
> On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com
> <mailto:noqnoqneo at gmail.com>> wrote:
>
> The problem we're solving is *entirely* about inter-procedural
> analysis. Like, as long as there are no inlined calls on the path
> (and therefore no pruning is being done), then there's no problem
> at all: the user can easily see that the value is untouched by
> direct observation.
>
>
> I have explained myself poorly. I intend to implement an
> /intraprocedural/ slicing, but that would work /interprocedurally/ by
> resolving function calls with the help of the analyzer. This is
> important, because for the by-the-book interprocedural slicing you'd
> need a system dependence graph. Now, if a function isn't inlined, we
> need to do this on our own. I see a couple unknowns here (how deep do
> we explore such functions? how to we resolve non-trivial parameter
> passing? are we sure that tracking expressions here is actually
> meaningful?) but I think tackling simple cases is achievable.
>
> In this sense i find the lack of note in example 2 completely
> non-problematic (as long as control dependency tracking lands).
> It's not a problem that the user isn't told that bar() could have
> initialized 'x', because the user already knows that bar() is not
> even called in the first place.
>
>
> Would you agree that tracking flag however would be good?
Yes, totally.
> However, there may be a combination of category 1 and category 2
> that is much more interesting: what if main() unconditionally
> calls foo() that conditionally calls bar() that unconditionally
> initializes the value? Eg.:
>
> void bar(int *x) {
> *x = 1; // the interesting event that doesn't happen
> }
>
> void foo(int *x) {
> if (rand()) // control dependency of the interesting event
> bar(x);
> }
>
> int main() {
> int x;
> foo(&x);
> use(x); // use of uninitialized value
> }
>
> In this case it is essential to highlight the path within foo() so
> that to explain how it fails to call bar().
>
> On 09.06.2019 17:50, Kristóf Umann wrote:
>> I gave this some thought, and realized that we could divide the
>> "should-not-have-happened" cases into 2 categories:
>>
>> 1. The unexplored block is within a function that the analyzer
>> visited.
>> 2. The unexplored block lies within a function that the analyzer
>> didn't even visit.
>>
>> For example:
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09 if (flag) // assumed false
>>
>> 10 i = new int;
>>
>> 11 }
>>
>> 12
>>
>> 13 int main() {
>>
>> 14 int *x = 0; // x initialized to 0
>>
>> 15 flag = 1;
>>
>> 16 foo();// no note, but there should be one
>>
>> 17 bar(x);
>>
>> 18 foo(); // no note, but there should be one
>>
>> 19
>>
>> 20 if (flag) // assumed true
>>
>> 21 *x = 5; // warn
>>
>> 22 }
>>
>>
>> Observe the subtle difference that on line 17, function bar is
>> called, and the first branch lies in that function. This is
>> definitely a "category 1" case, since the analyzer inlined and
>> visited bar(), but again, failed the realize the importance of
>> flag due to not visiting line 10, and makes no mention of line 16
>> in the bugreport.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09 i = new int;
>>
>> 10 }
>>
>> 11
>>
>> 12 int main() {
>>
>> 13 int *x = 0; // x initialized to 0
>>
>> 14 flag = 1;
>>
>> 15 foo();// no note, but there should be one
>>
>> 16 if (flag) // assumed false
>>
>> 17 bar(x);
>>
>> 18 foo(); // no note, but there should be one
>>
>> 19
>>
>> 20 if (flag) // assumed true
>>
>> 21 *x = 5; // warn
>>
>> 22 }
>>
>>
>> In this example, the if statement stays, but the assignment to x
>> is now found in bar(). In this case, the analyzer didn't explore
>> bar(), so it's a "category 2" case.
>>
>>
>> Now, the reason why these categories are important is that if the
>> analyzer actually explored a function, it solves the problem of
>> parameters: for the example shown for "category 1", we can simply
>> ask the analyzer whether x in function main() is the same as i in
>> function bar(). This is, as stated in my previous letter, far
>> more difficult for "category 2" cases.
>>
>>
>> I think when I'm finished with the "must-have-happened" cases, I
>> could start enhancing NoStoreFuncVisitor to gather conditions
>> possibly worth tracking for "category 1" cases, see whether an
>> intraprocedural slicing algorithm makes sense, could worry about
>> resolving parameters for the other case one when I get there.
>>
>>
>> Any thoughts on this approach? :)
>>
>>
>> On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com
>> <mailto:dkszelethus at gmail.com>> wrote:
>>
>> Hi!
>>
>> I'm currently working on a GSoC project that aims to improve
>> the description we provide for bug reports the Static
>> Analyzer emits.
>>
>> We divided the problem (that the bugreports didn't explain
>> how the bug came about very well) into two parts:
>>
>> 1. The information is available in the bug path (basically
>> parts of the code the analyzer actually explored on that
>> particular path of execution). We refer to these as the
>> "must-have-happened" case.
>> 2. The information isn't available, possibly not even in the
>> ExplodedGraph (parts of the program the analyzer explored on
>> all paths of execution). We call this the
>> "should-not-have-happened" case.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 int main() {
>>
>> 09 int *x = 0; // x initialized to 0
>>
>> 10 flag = 1;
>>
>> 11 foo();// no note, but there should be one
>>
>> 12 if (flag) // assumed false
>>
>> 13 x = new int;
>>
>> 14 foo(); // no note, but there should be one
>>
>> 15
>>
>> 16 if (flag) // assumed true
>>
>> 17 *x = 5; // warn
>>
>> 18 }
>>
>>
>> The first branch is a good example for the
>> "should-not-have-happened" case: the bug path doesn't contain
>> line 13, so we don't really know whether the condition on
>> line 12 is a big deal, which in this case it is (unlike if it
>> only guarded a print of some sort)
>>
>> The second branch is a good example for the
>> "must-have-happened" case: we know that the condition on line
>> 16 is very important.
>>
>> I like to think that I made a very decent progress on the
>> first part, as it's objectively a lot simpler. A great
>> addition to the analyzer is that it now understands (at
>> least, when my patches land) control dependencies in the CFG.
>> With that, we can figure out that flag's value on line 16 is
>> crucial, so we track it, resulting in notes being added on
>> line 14, and on line 5, explaining that flag's value changed.
>>
>> However, we are very unsure about how to tackle the second
>> part on the project. I gave this a lot of thought, we had a
>> couple meetings in private, and I'd like to share where I am
>> with this.
>>
>> My project proposed to implement a static backward program
>> slicing algorithm which would definitely be amazing, but
>> seems to be an unrealistic approach. Basically, any slicing
>> algorithm that is interprocedural would require a system
>> dependence graph, something we just simply don't have, not
>> even for LLVM.
>>
>> Whenever pointer semantics are in the picture, we have a
>> really hard time: it's something that is super difficult to
>> reason about without symbolic execution, and it would
>> probably be a good idea not to tackle such cases until we
>> come up with something clever (maybe the pset calculation
>> Gábor Horváth is working on?). We should, however, being able
>> to handle reference parameters.
>>
>> So then, what's the goal exactly?
>>
>> We'd like to teach the analyzer that some code blocks (even
>> in other functions) could have written the variable that is
>> responsible for the bug (x, in the included example), and is
>> very important to the bug report. We could use this
>> information to track variables similarly to the
>> "must-have-happened" case: track conditions that prevented
>> the write from happening.
>>
>> To the specifics. I think it's reasonable to implement an
>> /intraprocedural/ slicing algorithm. For the included
>> example, I think we should be able to discover the potential
>> write to x on line 9. We don't need anything we already don't
>> have to achieve this.
>>
>> Reasoning across function boundaries could be achieved by
>> inlining functions (as I understand it, simply insert the
>> function's CFGBlocks), but there are big unknowns in this.
>>
>> So my question is, do you think such inlining is possible? If
>> not, should I just create a primitive variable-to-parameter
>> mapping? Is there something else on the table?
>>
>> Cheers!
>> Kristóf Umann
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/8c3bd867/attachment.html>
More information about the cfe-dev
mailing list