[cfe-dev] [analyzer] Discovering information not contained in the bugpath
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Jun 10 19:18:58 PDT 2019
Hmm, what do you mean by this line?:
trackExpressionValue(N, RHS of assignment, Report)
Like, how does the existing visitor infrastructure behave when the
statement is, by construction, never appears in the bug path? It doesn't
even have a value to track, as it never appeared in the Environment.
P.S. I just realized that this discussion would probably be messy to
read later through archives, as old revisions of the google doc are no
longer available.
On 6/10/19 6:46 PM, Kristóf Umann wrote:
> I rewrote my sketch algorithm that showcases how I imagine this to work.
>
> https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit
>
> On Mon, 10 Jun 2019 at 20:03, Kristóf Umann <dkszelethus at gmail.com
> <mailto:dkszelethus at gmail.com>> wrote:
>
>
>
> On Mon, 10 Jun 2019 at 05:03, Artem Dergachev <noqnoqneo at gmail.com
> <mailto:noqnoqneo at gmail.com>> wrote:
>
> The problem we're solving is *entirely* about inter-procedural
> analysis. Like, as long as there are no inlined calls on the
> path (and therefore no pruning is being done), then there's no
> problem at all: the user can easily see that the value is
> untouched by direct observation.
>
>
> I have explained myself poorly. I intend to implement an
> /intraprocedural/ slicing, but that would work
> /interprocedurally/ by resolving function calls with the help of
> the analyzer. This is important, because for the by-the-book
> interprocedural slicing you'd need a system dependence graph. Now,
> if a function isn't inlined, we need to do this on our own. I see
> a couple unknowns here (how deep do we explore such functions? how
> to we resolve non-trivial parameter passing? are we sure that
> tracking expressions here is actually meaningful?) but I think
> tackling simple cases is achievable.
>
> In this sense i find the lack of note in example 2 completely
> non-problematic (as long as control dependency tracking
> lands). It's not a problem that the user isn't told that bar()
> could have initialized 'x', because the user already knows
> that bar() is not even called in the first place.
>
>
> Would you agree that tracking flag however would be good?
>
> However, there may be a combination of category 1 and category
> 2 that is much more interesting: what if main()
> unconditionally calls foo() that conditionally calls bar()
> that unconditionally initializes the value? Eg.:
>
> void bar(int *x) {
> *x = 1; // the interesting event that doesn't happen
> }
>
> void foo(int *x) {
> if (rand()) // control dependency of the interesting event
> bar(x);
> }
>
> int main() {
> int x;
> foo(&x);
> use(x); // use of uninitialized value
> }
>
> In this case it is essential to highlight the path within
> foo() so that to explain how it fails to call bar().
>
> On 09.06.2019 17:50, Kristóf Umann wrote:
>> I gave this some thought, and realized that we could divide
>> the "should-not-have-happened" cases into 2 categories:
>>
>> 1. The unexplored block is within a function that the
>> analyzer visited.
>> 2. The unexplored block lies within a function that the
>> analyzer didn't even visit.
>>
>> For example:
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09 if (flag) // assumed false
>>
>> 10 i = new int;
>>
>> 11 }
>>
>> 12
>>
>> 13 int main() {
>>
>> 14 int *x = 0; // x initialized to 0
>>
>> 15 flag = 1;
>>
>> 16 foo();// no note, but there should be one
>>
>> 17 bar(x);
>>
>> 18 foo(); // no note, but there should be one
>>
>> 19
>>
>> 20 if (flag) // assumed true
>>
>> 21 *x = 5; // warn
>>
>> 22 }
>>
>>
>> Observe the subtle difference that on line 17, function bar
>> is called, and the first branch lies in that function. This
>> is definitely a "category 1" case, since the analyzer inlined
>> and visited bar(), but again, failed the realize the
>> importance of flag due to not visiting line 10, and makes no
>> mention of line 16 in the bugreport.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 void bar(int &i) {
>>
>> 09 i = new int;
>>
>> 10 }
>>
>> 11
>>
>> 12 int main() {
>>
>> 13 int *x = 0; // x initialized to 0
>>
>> 14 flag = 1;
>>
>> 15 foo();// no note, but there should be one
>>
>> 16 if (flag) // assumed false
>>
>> 17 bar(x);
>>
>> 18 foo(); // no note, but there should be one
>>
>> 19
>>
>> 20 if (flag) // assumed true
>>
>> 21 *x = 5; // warn
>>
>> 22 }
>>
>>
>> In this example, the if statement stays, but the assignment
>> to x is now found in bar(). In this case, the analyzer didn't
>> explore bar(), so it's a "category 2" case.
>>
>>
>> Now, the reason why these categories are important is that if
>> the analyzer actually explored a function, it solves the
>> problem of parameters: for the example shown for "category
>> 1", we can simply ask the analyzer whether x in function
>> main() is the same as i in function bar(). This is, as stated
>> in my previous letter, far more difficult for "category 2" cases.
>>
>>
>> I think when I'm finished with the "must-have-happened"
>> cases, I could start enhancing NoStoreFuncVisitor to gather
>> conditions possibly worth tracking for "category 1" cases,
>> see whether an intraprocedural slicing algorithm makes sense,
>> could worry about resolving parameters for the other case one
>> when I get there.
>>
>>
>> Any thoughts on this approach? :)
>>
>>
>> On Sat, 8 Jun 2019 at 01:59, Kristóf Umann
>> <dkszelethus at gmail.com <mailto:dkszelethus at gmail.com>> wrote:
>>
>> Hi!
>>
>> I'm currently working on a GSoC project that aims to
>> improve the description we provide for bug reports the
>> Static Analyzer emits.
>>
>> We divided the problem (that the bugreports didn't
>> explain how the bug came about very well) into two parts:
>>
>> 1. The information is available in the bug path
>> (basically parts of the code the analyzer actually
>> explored on that particular path of execution). We refer
>> to these as the "must-have-happened" case.
>> 2. The information isn't available, possibly not even in
>> the ExplodedGraph (parts of the program the analyzer
>> explored on all paths of execution). We call this the
>> "should-not-have-happened" case.
>>
>> 01 int flag;
>>
>> 02 bool coin();
>>
>> 03
>>
>> 04 void foo() {
>>
>> 05 flag = coin();// no note, but there should be one
>>
>> 06 }
>>
>> 07
>>
>> 08 int main() {
>>
>> 09 int *x = 0; // x initialized to 0
>>
>> 10 flag = 1;
>>
>> 11 foo();// no note, but there should be one
>>
>> 12 if (flag) // assumed false
>>
>> 13 x = new int;
>>
>> 14 foo(); // no note, but there should be one
>>
>> 15
>>
>> 16 if (flag) // assumed true
>>
>> 17 *x = 5; // warn
>>
>> 18 }
>>
>>
>> The first branch is a good example for the
>> "should-not-have-happened" case: the bug path doesn't
>> contain line 13, so we don't really know whether the
>> condition on line 12 is a big deal, which in this case it
>> is (unlike if it only guarded a print of some sort)
>>
>> The second branch is a good example for the
>> "must-have-happened" case: we know that the condition on
>> line 16 is very important.
>>
>> I like to think that I made a very decent progress on the
>> first part, as it's objectively a lot simpler. A great
>> addition to the analyzer is that it now understands (at
>> least, when my patches land) control dependencies in the
>> CFG. With that, we can figure out that flag's value on
>> line 16 is crucial, so we track it, resulting in notes
>> being added on line 14, and on line 5, explaining that
>> flag's value changed.
>>
>> However, we are very unsure about how to tackle the
>> second part on the project. I gave this a lot of thought,
>> we had a couple meetings in private, and I'd like to
>> share where I am with this.
>>
>> My project proposed to implement a static backward
>> program slicing algorithm which would definitely be
>> amazing, but seems to be an unrealistic approach.
>> Basically, any slicing algorithm that is interprocedural
>> would require a system dependence graph, something we
>> just simply don't have, not even for LLVM.
>>
>> Whenever pointer semantics are in the picture, we have a
>> really hard time: it's something that is super difficult
>> to reason about without symbolic execution, and it would
>> probably be a good idea not to tackle such cases until we
>> come up with something clever (maybe the pset calculation
>> Gábor Horváth is working on?). We should, however, being
>> able to handle reference parameters.
>>
>> So then, what's the goal exactly?
>>
>> We'd like to teach the analyzer that some code blocks
>> (even in other functions) could have written the variable
>> that is responsible for the bug (x, in the included
>> example), and is very important to the bug report. We
>> could use this information to track variables similarly
>> to the "must-have-happened" case: track conditions that
>> prevented the write from happening.
>>
>> To the specifics. I think it's reasonable to implement an
>> /intraprocedural/ slicing algorithm. For the included
>> example, I think we should be able to discover the
>> potential write to x on line 9. We don't need anything we
>> already don't have to achieve this.
>>
>> Reasoning across function boundaries could be achieved by
>> inlining functions (as I understand it, simply insert the
>> function's CFGBlocks), but there are big unknowns in this.
>>
>> So my question is, do you think such inlining is
>> possible? If not, should I just create a primitive
>> variable-to-parameter mapping? Is there something else on
>> the table?
>>
>> Cheers!
>> Kristóf Umann
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190610/0aa94d97/attachment.html>
More information about the cfe-dev
mailing list