[cfe-dev] [analyzer] Speaking about reaching definitions...

Wed Jul 17 08:56:57 PDT 2019

Just to be clear I am not against having more heuristics to suppress false
positives, I just have some concerns around some scenarios:
1.) Debugging: it will get harder and harder to actually write examples
that trigger the desired parts of the analyzer unless we turn some of the
heuristics off, but this of course only a minor concern.
2.) Probing: When someone evaluates the analyzer she is likely to do it
using little code examples. While I do agree that the analyzer should be
optimized for real world code I think it is good if it works for silly
little examples as well to give a good impression for the users. When we
writes such examples we often do not notice subtle properties of the code
like the one you written before. And the user might have the impression
that the analyzer is unable to catch those simple examples.
3.) Fixing reports: how can a user be user if she really fixed a problem or
only triggered yet another suppression heuristics?
4.) Introducing reports: changing seemingly unrelated parts of the code can
"untrigger" suppressions introducing new reports and the user might wonder
what change triggered the new warnings and why.

Of course 3) and 4) are already kind of inherent properties of symbolic
execution as the analysis coverage can change pretty easily by slightly
changing the code but I am still a bit uncomfortable with making this worse
without being able to give a good explanation what is going on.

All in all I think a heuristic like this is a good idea if we can give the
users a good and intuitive description why the bug was suppressed in a code
snippet. And that description should not require the knowledge of reaching
definitions or other similar concepts. So I think if we can easily extend
the FAQ with such information I am on board with this change. For example,
inline defensive check suppression is easy to explain, we can tell the
users the analyzer ignore some checks in the callees since they might only
be applicable for other callers. If we can have a similar clear description
for this one let's move ahead. What do you think?

Taking another look at the example:
It is only false positive if bar never returns null. Letting the analyzer
know that this is the case either with an assertion or an annotation makes
the issue disappear. I guess this is a question how static analysis
friendly one is. But my preferred solution in this case would be to let the
analyzer know the invariants of the code rather than suppressing the
report. But of course, to truly know the usefulness of the heuristics we
need to try it on real world code :)

On Wed, 17 Jul 2019 at 02:03, Kristóf Umann via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I talked with Gábor, and am shamelessly stealing his comment, but I'm
> doing this for the greater good so it's not forgotten :^)
>
> I don't immediately see anything wrong with this heuristic -- I mean, we
> have to see how it behaves, but let's presume it works. It's obvious though
> we would sometimes suppress true positives. How many similar suppressing
> tricks do we have? Does this mean that a small change in my code in order
> to solve a bug found by static analysis could trigger some suppression and
> "untrigger" others? Should I expect seemingly random results after minimal
> changes? I guess there is an argument to be made with also being
> conservative with how many suppression techniques do we have, and how would
> they interact.
>
> On Fri, 28 Jun 2019 at 02:21, Artem Dergachev via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> We do occasionally make things configurable but this particular situation
>> strikes me as something that i'd find hard to explain to the users if i add
>> such option. I guess we can eventually add a global "optimism level" option
>> that would tweak a lot of such behaviors.
>>
>> But even when we do that, we'll be pretty far away from a verification
>> machine. Even in the most optimistic mode we won't give any guarantees that
>> we'll prevent all the bugs of the given kind, so for a seriously critical
>> piece of code you'll have to use other tools.
>>
>> On 6/27/19 3:13 PM, Phil King wrote:
>>
>> Would it make sense to allow this sort of behaviour to be configurable?
>>
>> For example, much of the time I might not want to be nagged with “this
>> may be a problem” and would like a pragmatic approach, but if I’m writing
>> some critical code I would like to know “this cannot be proven to be
>> correct” and would like the check to be pessimistic.
>>
>> These different use-cases can also be adopted when checking legacy code
>> (pragmatic) or new code (pessimistic).
>>
>> For the pessimistic case, there is still the chance to use information
>> about bar() to drop the warning if it can be shown never to yield a null
>> pointer.
>>
>> Sent from my iPhone
>>
>> On 27 Jun 2019, at 22:55, Artem Dergachev via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> Yeah, i mean, we cannot be sure, therefore we want to be conservative and
>> not bother the user with a warning. It might be a true positive, but it's
>> very wonky and there's nothing in the code that indicates that bar() may
>> return null; the code makes perfect sense even if bar() doesn't ever return
>> null.
>>
>> On 6/27/19 2:49 PM, Gábor Horváth wrote:
>>
>> I am not sure I follow why do we think that the second example is a false
>> positive.
>> I think it depends on the user intent. If the user wanted to check if b
>> was reassigned (i.e. checking for the source of the value), and bar never
>> returns a null pointer than it is definitely a false positive. But we
>> cannot be sure what the intent of the check was. What if the user just
>> wanted to check the value regardless of its source.
>>
>> On Thu, 27 Jun 2019 at 13:56, Artem Dergachev <noqnoqneo at gmail.com>
>> wrote:
>>
>>> This is very loosely related to Kristof's GSoC and this is my favorite
>>> subject: weird assumption chains.
>>>
>>> Consider:
>>>
>>>    void foo1() {
>>>      int *a = bar();
>>>      int *b = a;
>>>      if (b) { /* ... */ }
>>>      *a = 1;
>>>    }
>>>
>>> This is a valid null dereference bug. Like, 'b' is probably null
>>> (otherwise why check?), therefore 'a', which is equal to 'b', may also
>>> be null.
>>>
>>> Now consider:
>>>
>>>    void foo2() {
>>>      int *a = bar();
>>>      int *b = nullptr;
>>>      if (coin()) {
>>>        b = a;
>>>      }
>>>      if (b) { /* ... */ }
>>>      *a = 1;
>>>    }
>>>
>>> In foo2 we will report a null dereference as well, however the null
>>> check for 'b' is well-justified even if bar() never returns null,
>>> therefore it's a false positive.
>>>
>>> How 'bout we suppress the null dereference warning when the
>>> reaching-definition analysis for 'b' that starts at 'if (b)' - i.e. at
>>> the collapse point - yields multiple definitions and some of them is a
>>> plain null?
>>>
>>> Note that the plain-null definition would never be a part of the bug
>>> path because it would not have a corresponding collapse point (it's
>>> already a concrete null).
>>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190717/958cbe8a/attachment.html>