[cfe-dev] [analyzer] Discovering information not contained in the bugpath
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Sun Jun 9 20:03:46 PDT 2019
The problem we're solving is *entirely* about inter-procedural analysis.
Like, as long as there are no inlined calls on the path (and therefore
no pruning is being done), then there's no problem at all: the user can
easily see that the value is untouched by direct observation.
In this sense i find the lack of note in example 2 completely
non-problematic (as long as control dependency tracking lands). It's not
a problem that the user isn't told that bar() could have initialized
'x', because the user already knows that bar() is not even called in the
first place.
However, there may be a combination of category 1 and category 2 that is
much more interesting: what if main() unconditionally calls foo() that
conditionally calls bar() that unconditionally initializes the value? Eg.:
void bar(int *x) {
*x = 1; // the interesting event that doesn't happen
}
void foo(int *x) {
if (rand()) // control dependency of the interesting event
bar(x);
}
int main() {
int x;
foo(&x);
use(x); // use of uninitialized value
}
In this case it is essential to highlight the path within foo() so that
to explain how it fails to call bar().
On 09.06.2019 17:50, Kristóf Umann wrote:
> I gave this some thought, and realized that we could divide the
> "should-not-have-happened" cases into 2 categories:
>
> 1. The unexplored block is within a function that the analyzer visited.
> 2. The unexplored block lies within a function that the analyzer
> didn't even visit.
>
> For example:
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05 flag = coin();// no note, but there should be one
>
> 06 }
>
> 07
>
> 08 void bar(int &i) {
>
> 09 if (flag) // assumed false
>
> 10 i = new int;
>
> 11 }
>
> 12
>
> 13 int main() {
>
> 14 int *x = 0; // x initialized to 0
>
> 15 flag = 1;
>
> 16 foo();// no note, but there should be one
>
> 17 bar(x);
>
> 18 foo(); // no note, but there should be one
>
> 19
>
> 20 if (flag) // assumed true
>
> 21 *x = 5; // warn
>
> 22 }
>
>
> Observe the subtle difference that on line 17, function bar is called,
> and the first branch lies in that function. This is definitely a
> "category 1" case, since the analyzer inlined and visited bar(), but
> again, failed the realize the importance of flag due to not visiting
> line 10, and makes no mention of line 16 in the bugreport.
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05 flag = coin();// no note, but there should be one
>
> 06 }
>
> 07
>
> 08 void bar(int &i) {
>
> 09 i = new int;
>
> 10 }
>
> 11
>
> 12 int main() {
>
> 13 int *x = 0; // x initialized to 0
>
> 14 flag = 1;
>
> 15 foo();// no note, but there should be one
>
> 16 if (flag) // assumed false
>
> 17 bar(x);
>
> 18 foo(); // no note, but there should be one
>
> 19
>
> 20 if (flag) // assumed true
>
> 21 *x = 5; // warn
>
> 22 }
>
>
> In this example, the if statement stays, but the assignment to x is
> now found in bar(). In this case, the analyzer didn't explore bar(),
> so it's a "category 2" case.
>
>
> Now, the reason why these categories are important is that if the
> analyzer actually explored a function, it solves the problem of
> parameters: for the example shown for "category 1", we can simply ask
> the analyzer whether x in function main() is the same as i in function
> bar(). This is, as stated in my previous letter, far more difficult
> for "category 2" cases.
>
>
> I think when I'm finished with the "must-have-happened" cases, I could
> start enhancing NoStoreFuncVisitor to gather conditions possibly worth
> tracking for "category 1" cases, see whether an intraprocedural
> slicing algorithm makes sense, could worry about resolving parameters
> for the other case one when I get there.
>
>
> Any thoughts on this approach? :)
>
>
> On Sat, 8 Jun 2019 at 01:59, Kristóf Umann <dkszelethus at gmail.com
> <mailto:dkszelethus at gmail.com>> wrote:
>
> Hi!
>
> I'm currently working on a GSoC project that aims to improve the
> description we provide for bug reports the Static Analyzer emits.
>
> We divided the problem (that the bugreports didn't explain how the
> bug came about very well) into two parts:
>
> 1. The information is available in the bug path (basically parts
> of the code the analyzer actually explored on that particular path
> of execution). We refer to these as the "must-have-happened" case.
> 2. The information isn't available, possibly not even in the
> ExplodedGraph (parts of the program the analyzer explored on all
> paths of execution). We call this the "should-not-have-happened" case.
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05 flag = coin();// no note, but there should be one
>
> 06 }
>
> 07
>
> 08 int main() {
>
> 09 int *x = 0; // x initialized to 0
>
> 10 flag = 1;
>
> 11 foo();// no note, but there should be one
>
> 12 if (flag) // assumed false
>
> 13 x = new int;
>
> 14 foo(); // no note, but there should be one
>
> 15
>
> 16 if (flag) // assumed true
>
> 17 *x = 5; // warn
>
> 18 }
>
>
> The first branch is a good example for the
> "should-not-have-happened" case: the bug path doesn't contain line
> 13, so we don't really know whether the condition on line 12 is a
> big deal, which in this case it is (unlike if it only guarded a
> print of some sort)
>
> The second branch is a good example for the "must-have-happened"
> case: we know that the condition on line 16 is very important.
>
> I like to think that I made a very decent progress on the first
> part, as it's objectively a lot simpler. A great addition to the
> analyzer is that it now understands (at least, when my patches
> land) control dependencies in the CFG. With that, we can figure
> out that flag's value on line 16 is crucial, so we track it,
> resulting in notes being added on line 14, and on line 5,
> explaining that flag's value changed.
>
> However, we are very unsure about how to tackle the second part on
> the project. I gave this a lot of thought, we had a couple
> meetings in private, and I'd like to share where I am with this.
>
> My project proposed to implement a static backward program slicing
> algorithm which would definitely be amazing, but seems to be an
> unrealistic approach. Basically, any slicing algorithm that is
> interprocedural would require a system dependence graph, something
> we just simply don't have, not even for LLVM.
>
> Whenever pointer semantics are in the picture, we have a really
> hard time: it's something that is super difficult to reason about
> without symbolic execution, and it would probably be a good idea
> not to tackle such cases until we come up with something clever
> (maybe the pset calculation Gábor Horváth is working on?). We
> should, however, being able to handle reference parameters.
>
> So then, what's the goal exactly?
>
> We'd like to teach the analyzer that some code blocks (even in
> other functions) could have written the variable that is
> responsible for the bug (x, in the included example), and is very
> important to the bug report. We could use this information to
> track variables similarly to the "must-have-happened" case: track
> conditions that prevented the write from happening.
>
> To the specifics. I think it's reasonable to implement an
> /intraprocedural/ slicing algorithm. For the included example, I
> think we should be able to discover the potential write to x on
> line 9. We don't need anything we already don't have to achieve this.
>
> Reasoning across function boundaries could be achieved by inlining
> functions (as I understand it, simply insert the function's
> CFGBlocks), but there are big unknowns in this.
>
> So my question is, do you think such inlining is possible? If not,
> should I just create a primitive variable-to-parameter mapping? Is
> there something else on the table?
>
> Cheers!
> Kristóf Umann
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190609/b26729a5/attachment.html>
More information about the cfe-dev
mailing list