[llvm-dev] [cfe-dev] Soundness in clang SA

Siddharth Shankar Swain via llvm-dev llvm-dev at lists.llvm.org
Sat Aug 25 23:18:23 PDT 2018


Hi,
What all methods are possible for program analysis? There are :-
1) Flow sensitive analysis
2) Path Sensitive analysis ( Symbolic execution )
3) Context sensitive analysis

Any other current research in this field ? Please guide

Thanks,
Siddharth

On Sat, Aug 25, 2018 at 4:42 AM, Artem Dergachev <noqnoqneo at gmail.com>
wrote:

>
>
> On 8/24/18 5:31 AM, Siddharth Shankar Swain wrote:
>
> Thanks for the detailed explanation. So is changing the analysis technique
> from path sensitive (symbolic execution) analysis to some other technique
> bring in the soundness property ?
>
>
> Yup, depending on the technique.
>
> And that wouldn't be Analyzer-specific anymore.
>
> Thanks,
> Siddharth
>
> On Sat, Aug 18, 2018 at 1:27 AM, Artem Dergachev <noqnoqneo at gmail.com>
> wrote:
>
>> Nope, at least not with the "path-sensitive" (symbolic execution) engine.
>> Which is the whole point of using the Analyzer, as compared to writing an
>> arbitrary tool by exploring the AST yourself. Depends on what you want to
>> check, of course, something really trivial should be possible.
>>
>> While we do care about false positives a lot, there are some decisions
>> within the Analyzer's core that prevent perfect soundness from happening
>> but are still considered to be good because they greatly boost our ability
>> to find any bugs at all and at the same time simplify development
>> dramatically.
>>
>> One of the most common unsound assumptions that the Analyzer makes is
>> that in a program that looks like this: "if (x) { ... } if (y) { ... }",
>> where 'x' and 'y' are completely unrelated to each other, the Analyzer
>> would explore all four paths (i.e., (x is true or false) times (y is true
>> or false)) as if it's sure that they are feasible. This is not only unsound
>> on its own, but it amplifies other inaccuracies in modeling (eg., if
>> function body is unavailable for analysis, destroying information when such
>> function is called would lead not only to false negatives but to false
>> positives as well). But these false positives are relatively easy to
>> suppress (eg., by adding some sort of assert(x || !y), which is in most
>> cases anyway a good thing to document), and the primary benefit of this
>> simplification is that checkers no longer need to implement a "merge"
>> function for the information that they track in the program state: they
>> only need to update their state by modeling effect of every
>> non-control-flow event in the program. With ~100 checkers already written,
>> limiting them to only simple operations on the program state helps
>> dramatically.
>>
>> Additionally, the Analyzer's understanding of the language is not
>> perfect. We're constantly working on this, but languages like C++ are
>> *huge*. We have a separation of responsibilities between core and checkers
>> that allow us to write most transfer functions once within the core and
>> have checkers simply consume the results of the modeling (checkers may
>> still want to model their own metadata), but covering all AST nodes and
>> their interactions is still, well, harder than writing a CodeGen. Any bugs
>> in core transfer functions will prevent you from doing a sound analysis,
>> and making sure all such bugs are fixed is probably much more work than
>> whatever you plan to accomplish here.
>>
>> On the non-technical side, it is always up to a developer of the program
>> to decide what is a genuine bug. For example, there are a lot of projects
>> on which a memory leak is not considered to be a bug. Or if an application
>> is not security-critical, a null pointer dereference we find may be on such
>> an unlikely path that the developer will only be annoyed to know about it
>> and never really appreciate the report. And the Analyzer doesn't even
>> guarantee that it'd find anything that's more severe than a dead code:
>> after all, the only reason symbolic execution works is "state splits are
>> justified because otherwise it's dead code".
>>
>> Last but not least, there's also the "Clang CFG" thing, which the
>> Analyzer uses (and some compiler warnings also use), but it's not used for
>> compilation (LLVM CFG is used instead) and it's not entirely accurate
>> (especially for C++, though it's getting better). Clang CFG is usable for
>> creating usable data flow analyses which you can make as sound as the CFG
>> is accurate. There's a collection of such analyses in lib/Analysis. The
>> Analyzer uses some of them internally and we're very happy with them most
>> of the time. At the same time, there's no fancy framework for creating
>> custom data flow analyses, like there is for the Analyzer's checkers;
>> you'll have to write all transfer functions yourself. So if all you want is
>> a ready-made CFG for C, you should have a look at that.
>>
>>
>> On 8/17/18 4:35 AM, Siddharth Shankar Swain via cfe-dev wrote:
>>
>> Hi all,
>>
>> Is it possible to develop a checker  or some feature in clang SA which
>> will only have perfect soundness property ( if we don't care about
>> completness property )  i.e if the analyzer says X is a genuine bug then X
>> is really a genuine bug. Whatever  bug  it reports are all genuine but it
>> doesn't report all genuine bugs. Please guide.
>>
>> Thanks,
>> Siddharth
>>
>> _______________________________________________
>> cfe-dev mailing listcfe-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180826/2be6ad1e/attachment.html>


More information about the llvm-dev mailing list