[llvm-dev] [cfe-dev] Soundness in clang SA
Artem Dergachev via llvm-dev
llvm-dev at lists.llvm.org
Fri Aug 17 12:57:56 PDT 2018
Nope, at least not with the "path-sensitive" (symbolic execution)
engine. Which is the whole point of using the Analyzer, as compared to
writing an arbitrary tool by exploring the AST yourself. Depends on what
you want to check, of course, something really trivial should be possible.
While we do care about false positives a lot, there are some decisions
within the Analyzer's core that prevent perfect soundness from happening
but are still considered to be good because they greatly boost our
ability to find any bugs at all and at the same time simplify
development dramatically.
One of the most common unsound assumptions that the Analyzer makes is
that in a program that looks like this: "if (x) { ... } if (y) { ... }",
where 'x' and 'y' are completely unrelated to each other, the Analyzer
would explore all four paths (i.e., (x is true or false) times (y is
true or false)) as if it's sure that they are feasible. This is not only
unsound on its own, but it amplifies other inaccuracies in modeling
(eg., if function body is unavailable for analysis, destroying
information when such function is called would lead not only to false
negatives but to false positives as well). But these false positives are
relatively easy to suppress (eg., by adding some sort of assert(x ||
!y), which is in most cases anyway a good thing to document), and the
primary benefit of this simplification is that checkers no longer need
to implement a "merge" function for the information that they track in
the program state: they only need to update their state by modeling
effect of every non-control-flow event in the program. With ~100
checkers already written, limiting them to only simple operations on the
program state helps dramatically.
Additionally, the Analyzer's understanding of the language is not
perfect. We're constantly working on this, but languages like C++ are
*huge*. We have a separation of responsibilities between core and
checkers that allow us to write most transfer functions once within the
core and have checkers simply consume the results of the modeling
(checkers may still want to model their own metadata), but covering all
AST nodes and their interactions is still, well, harder than writing a
CodeGen. Any bugs in core transfer functions will prevent you from doing
a sound analysis, and making sure all such bugs are fixed is probably
much more work than whatever you plan to accomplish here.
On the non-technical side, it is always up to a developer of the program
to decide what is a genuine bug. For example, there are a lot of
projects on which a memory leak is not considered to be a bug. Or if an
application is not security-critical, a null pointer dereference we find
may be on such an unlikely path that the developer will only be annoyed
to know about it and never really appreciate the report. And the
Analyzer doesn't even guarantee that it'd find anything that's more
severe than a dead code: after all, the only reason symbolic execution
works is "state splits are justified because otherwise it's dead code".
Last but not least, there's also the "Clang CFG" thing, which the
Analyzer uses (and some compiler warnings also use), but it's not used
for compilation (LLVM CFG is used instead) and it's not entirely
accurate (especially for C++, though it's getting better). Clang CFG is
usable for creating usable data flow analyses which you can make as
sound as the CFG is accurate. There's a collection of such analyses in
lib/Analysis. The Analyzer uses some of them internally and we're very
happy with them most of the time. At the same time, there's no fancy
framework for creating custom data flow analyses, like there is for the
Analyzer's checkers; you'll have to write all transfer functions
yourself. So if all you want is a ready-made CFG for C, you should have
a look at that.
On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote:
> Hi all,
>
> Is it possible to develop a checker or some feature in clang SA which
> will only have perfect soundness property ( if we don't care about
> completness property ) i.e if the analyzer says X is a genuine bug
> then X is really a genuine bug. Whatever bug it reports are all
> genuine but it doesn't report all genuine bugs. Please guide.
>
> Thanks,
> Siddharth
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180817/06406565/attachment-0001.html>
More information about the llvm-dev
mailing list