[llvm-dev] [cfe-dev] Soundness in clang SA

Fri Aug 24 16:12:49 PDT 2018

On 8/24/18 5:31 AM, Siddharth Shankar Swain wrote:
> Thanks for the detailed explanation. So is changing the analysis 
> technique from path sensitive (symbolic execution) analysis to some 
> other technique bring in the soundness property ?

Yup, depending on the technique.

And that wouldn't be Analyzer-specific anymore.

> Thanks,
> Siddharth
>
> On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com 
> <mailto:noqnoqneo at gmail.com>> wrote:
>
>     Nope, at least not with the "path-sensitive" (symbolic execution)
>     engine. Which is the whole point of using the Analyzer, as
>     compared to writing an arbitrary tool by exploring the AST
>     yourself. Depends on what you want to check, of course, something
>     really trivial should be possible.
>
>     While we do care about false positives a lot, there are some
>     decisions within the Analyzer's core that prevent perfect
>     soundness from happening but are still considered to be good
>     because they greatly boost our ability to find any bugs at all and
>     at the same time simplify development dramatically.
>
>     One of the most common unsound assumptions that the Analyzer makes
>     is that in a program that looks like this: "if (x) { ... } if (y)
>     { ... }", where 'x' and 'y' are completely unrelated to each
>     other, the Analyzer would explore all four paths (i.e., (x is true
>     or false) times (y is true or false)) as if it's sure that they
>     are feasible. This is not only unsound on its own, but it
>     amplifies other inaccuracies in modeling (eg., if function body is
>     unavailable for analysis, destroying information when such
>     function is called would lead not only to false negatives but to
>     false positives as well). But these false positives are relatively
>     easy to suppress (eg., by adding some sort of assert(x || !y),
>     which is in most cases anyway a good thing to document), and the
>     primary benefit of this simplification is that checkers no longer
>     need to implement a "merge" function for the information that they
>     track in the program state: they only need to update their state
>     by modeling effect of every non-control-flow event in the program.
>     With ~100 checkers already written, limiting them to only simple
>     operations on the program state helps dramatically.
>
>     Additionally, the Analyzer's understanding of the language is not
>     perfect. We're constantly working on this, but languages like C++
>     are *huge*. We have a separation of responsibilities between core
>     and checkers that allow us to write most transfer functions once
>     within the core and have checkers simply consume the results of
>     the modeling (checkers may still want to model their own
>     metadata), but covering all AST nodes and their interactions is
>     still, well, harder than writing a CodeGen. Any bugs in core
>     transfer functions will prevent you from doing a sound analysis,
>     and making sure all such bugs are fixed is probably much more work
>     than whatever you plan to accomplish here.
>
>     On the non-technical side, it is always up to a developer of the
>     program to decide what is a genuine bug. For example, there are a
>     lot of projects on which a memory leak is not considered to be a
>     bug. Or if an application is not security-critical, a null pointer
>     dereference we find may be on such an unlikely path that the
>     developer will only be annoyed to know about it and never really
>     appreciate the report. And the Analyzer doesn't even guarantee
>     that it'd find anything that's more severe than a dead code: after
>     all, the only reason symbolic execution works is "state splits are
>     justified because otherwise it's dead code".
>
>     Last but not least, there's also the "Clang CFG" thing, which the
>     Analyzer uses (and some compiler warnings also use), but it's not
>     used for compilation (LLVM CFG is used instead) and it's not
>     entirely accurate (especially for C++, though it's getting
>     better). Clang CFG is usable for creating usable data flow
>     analyses which you can make as sound as the CFG is accurate.
>     There's a collection of such analyses in lib/Analysis. The
>     Analyzer uses some of them internally and we're very happy with
>     them most of the time. At the same time, there's no fancy
>     framework for creating custom data flow analyses, like there is
>     for the Analyzer's checkers; you'll have to write all transfer
>     functions yourself. So if all you want is a ready-made CFG for C,
>     you should have a look at that.
>
>
>     On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote:
>>     Hi all,
>>
>>     Is it possible to develop a checker  or some feature in clang SA
>>     which will only have perfect soundness property ( if we don't
>>     care about completness property )  i.e if the analyzer says X is
>>     a genuine bug then X is really a genuine bug. Whatever  bug  it
>>     reports are all genuine but it doesn't report all genuine bugs.
>>     Please guide.
>>
>>     Thanks,
>>     Siddharth
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     cfe-dev at lists.llvm.org  <mailto:cfe-dev at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev  <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180824/f7388068/attachment.html>