[cfe-dev] [analyzer] Tracking values for generating bug reports
    Gábor Horváth via cfe-dev 
    cfe-dev at lists.llvm.org
       
    Tue Jun 18 15:44:03 PDT 2019
    
    
  
Hi Kristóf,
Thanks for the report. I have been thinking about what kind of user
experience should we pursue. While my ideas might be a bit opinionated, see
my comments inline.
On Tue, 18 Jun 2019 at 20:47, Kristóf Umann <dkszelethus at gmail.com> wrote:
> Hi!
>
> This is an update on my GSoC project, "Enhancing bug reports in the Clang
> Static Analyzer". We had some discussions on phabricator, and some in
> private meetings, and I think it's due to share how things are looking
> right now.
>
> In my previous mail [1], I detailed the two distinct categories we
> defined, the "must-have-happened" case where every information is already
> in the bug path, and the "should-not-have-happened" case when the
> information isn't found in the bugpath. I divided the latter into two
> subcategories, the "Inlined category" (initially referred to as "category
> 1"), and the "Not inlined category" (initally referred to as "category 2").
>
> These categorizations are used to teach the analyzer incrementally about
> which nodes in the bugpath deserve special attention. For now, I plan to
> use this information exclusively for finding control dependencies to these
> nodes, and tracking their condition. Now, what we mean under "tracking an
> expression value", is that to add notes to the bug report relevant to that
> expression value.
>
> Ultimately, this means that my project depends greatly on condition
> tracking yielding meaningful addition of notes to the bug report, without
> adding unimportant ones. Since I more-or-less finished my work on the
> must-have-happened case (meaning that the analyzer can now figure out
> control dependencies to nodes contained in the bugpath), I'd like to detail
> how I plan to work on this.
>
> While evaluating an early prototype solution to the "must-have-happened"
> case where the same expression value tracking was used for both the
> bug-causing variable and for the conditions, I found that in many cases,
> the growth of bug length was intolerable. This is, in part, caused by
> conditions being tracked to a condition recursively, the conditions of
> asserts being tracked, and that notes about a condition are not as
> interesting as notes about the bug causing variable (calls to operator bool
> for instance).
>
> Fixing any of these requires me to teach the analyzer the difference in
> between "THE value" and "just a condition". The details are a little more
> complicated, so I'll show some examples that point out certain cases:
>
> *Example 1.:*
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05   flag = coin();
>
> 06 }
>
> 07
>
> 08 int main() {
>
> 09   int *x = 0;
>
> 10   foo();
>
> 11   if (flag) // assumed true
>
> 12     *x = 5; // warn
>
> 13 }
>
> In this example, it'd be great to see notes placed on line 10 and 5,
> because if flag wasn't invalidated, the bug would not have occurred (since
> flag is global, and is initialized to 0). The analyzer normally wouldn't
> place notes there, so we definitely should track flag up to line 5.
>
> *Example 2.:*
>
> 01 int flag;
>
> 02 bool coin();
>
> 03
>
> 04 void foo() {
>
> 05   coin();
>
> 06 }
>
> 07
>
> 08 int main() {
>
> 09   int *x = 0;
>
> 10   foo();
>
> 11   if (flag) // assumed true
>
> 12     *x = 5; // warn
>
> 13 }
>
> This case is very similar, with the only difference being that the
> analyzer conservatively assumed that coin may have written flag (as it's a
> global variable). We should track until line 5.
>
I am not sure about this example. While the invalidation of the global flag
is definitely playing a role in this bug report the point where the flag is
actually invalidated could seem quite arbitrary for the user. We might get
the invalidation when we first see a function that is not defined in this
TU, when we reach the maximum call stack depth and so on. My point is, I am
not sure if we actually help the user by pinpointing the first function
call where the invalidation happens. A more user friendly way to think
about this problem is how could the user solve this false positive? As far
as I understand the cannonical solution would be to add an assertion, so it
would be better to put a node close to the place where the user would
change the code, so I would put it in the stack frame of the bug. For
example we could generate a note for the call foo(), that somewhere in that
call stack the analyzer could no longer track the value of flag, thus
invalidated its contents. @Artem Dergachev <adergachev at apple.com> what do
you think?
>
> *Example 3.:*
>
> 01 void f(int flag) {
>
> 02   int *x = 0;
>
> 03   if (flag) // assumed true
>
> 04     *x = 5; // warn
>
> 05 }
>
> Here, the user could simply follow the arrows that shows the path of
> execution: it isn't really important what flag was initialized on the first
> line.
>
> *Example 4.:*
>
> 01 int flag;
>
> 02
>
> 03 int getInt();
>
> 04
>
> 05 int main() {
>
> 06   int *x = 0;
>
> 07   int y = getInt();
>
> 08   flag = y;
>
> 09   if (flag) // assumed true
>
> 10     *x = 5; // warn
>
> 11 }
>
> Again, the user could see there was a write made to flag on line 8 -- the
> question is, whether we should explain y better. Right now, we're thinking
> that we shouldn't, leading to the conclusion that flag here shouldn't be
> tracked at all.
>
Finding the place where flag was modified in a really big function can
still be challenging. While generating a note would be an overkill I wonder
if some middle ground like highlighting the statement without additional
text would actually make sense for this case.
>
> *Example 5.:*
>
> 01 int flag;
>
> 02
>
> 03 int getInt();
>
> 04
>
> 05 void foo() {
>
> 06   int y = getInt();
>
> 07   flag = y;
>
> 08 }
>
> 09
>
> 10 int main() {
>
> 11   int *x = 0;
>
> 12 foo();
>
> 13   if (flag) // assumed true
>
> 14     *x = 5; // warn
>
> 15 }
>
>
> Like Example 1-2, we should explain that flag was written on line 7, but
> like in Example 4., we shouldn't track y.
>
Again, from the user's point of view, I think the ultimate solution would
be to have an interface, where we does not present the part where y was
tracked by default, but the user could click on a button to actually expand
that part of the path in case she consider it interesting. This, of course,
is not part of you GSoC, I am just wondering what is the ideal that we
would like to pursue in the long run.
>
> *Example 6.:*
>
>
> 01 void f(int flag) {
>
> 02   int *x = 0;
>
> 03 assert(flag);
>
> 04   if (flag) // assumed true
>
> 05     *x = 5; // warn
>
> 06 }
>
> Currently, we mention nothing about line 3, yet we say "Taking the true
> branch" on line 4, rather then "Assuming the condition is true". This is
> because the collapse point (point at which we constrain flag's value to be
> true or false) isn't on line 4: the analysis only continued past the assert
> since the analyzer assumed flag to be non-zero. In this case, we would like
> to track flag to the assert to explain why we are so confident on line 5
> that flag is non-zero.
>
What do you mean by tracking tha flag back to the assert? The reason why a
note is useful because in this case either the assert is wrong or the bug
is a true positive. But again, when the assert is in the same function we
might not want to generate additional note for this as this info might
easily be inferred by following the arrows in the report (but we might
highlight the assertion line, see my previous comments). In case the assert
is in a separate (inlined) function it would make much more sense to
generate a note.
>
> *Example 7.:*
>
>
> 01 int getInt();
>
> 02
> 03 void f(int flag) {
>
> 04   int *x = 0;
>
> 05 flag = getInt();
>
> 06 assert(flag);
>
> 07   if (flag) // assumed true
>
> 08     *x = 5; // warn
>
> 09 }
>
> Like Example 6, we should explain why know that flag is non-zero on line 7
> by tracking it back to line 6. Like in the case of Example 4., the user
> could see where flag was written, so we wouldn't like to see a note on line
> 5.
>
> So what's the takeaway?
>
> After teaching the analyzer the difference in between a condition and a
> "regularly tracked expression", I plan to implement the following two rules:
>
> Track a condition only if
> a.) The collapse point doesn't coincide with the condition point
> b.) It was written in a nested stack frame.
>
Do you mane a) or b), or a) and b)? Also, what to do with "multiple
collapse point" events like:
if ( a < 10 ) ;
if ( a < 5) // Here we do add new constraints to a, but we also had other
constraints before. Do you consider this to coincide or not?
 ...
>
> We hope that by implementing this, tracking conditions to conditions would
> be kept at bay without a counter to limit the depth of recursion, and the
> intolerable growth of bug length with drastically shorten. I do expect
> skeletons to fall out of the closet, but I am confident that this is a good
> initial approach.
>
> As explained earlier, this is mission number one, so I'll prioritize
> getting it right before pursuing the "should-not-have-happened" case.
>
> One thing I did not touch on just yet, is the case where an assert was
> (correctly, by the way) regarded as a control dependency, and it's
> condition was tracked. This is undoubtedly undesirable, but figuring out
> whether the condition is found in an assert is rather difficult. Asserts
> are often implemented as a macro, and could have a very, for a lack of a
> better word, esoteric implementations on certain platforms. We discussed
> trying to tinker with the control dependency calculator, namely skipping
> over nodes that have two successors and one of them leads to noreturn node,
> but I still need time to figure out something reliable.
>
Some asserts consists of more basic blocks. Having two successors where one
of them is a noreturn block is not a sufficient condition. Consider for
example assert(a && b); Here the basic block for evaluating a will either
go to the next line after assertion or to the evaluation of b which is not
a noreturn block. While it is true that from node a we will either go to
the next non-assert block or end up in a noreturn block, the noreturn block
might not be an immediate successor of node a.
Regards,
Gabor
>
> Thanks to everyone who helped me along: Artem Dergachev, Gábor Horváth,
> Ádám Balogh, Jakub Kuderski, and Zoltán Porkoláb!
>
> Cheers,
> Kristóf
>
> [1]  http://lists.llvm.org/pipermail/cfe-dev/2019-June/062535.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190619/00669ec7/attachment.html>
    
    
More information about the cfe-dev
mailing list