[cfe-dev] exhaustiveness of CSA checkers

Wed Jan 15 14:36:27 PST 2020

Yup, i mostly agree with Gabor. If you want to hack on the Clang Static 
Analyzer in order to push it past its default limits, you can try 
bumping the following flags (intended for hacking purposes only!):

   // Interrupts the analysis when a CFG block is visited that many times.
   -analyzer-max-loop=4

   // Interrupts the analysis when ExplodedGraph has that many nodes.
   -analyzer-config max-nodes=225000

and setting the following flags (intended for hacking purposes only!):

   // Allow unrolling loops indefinitely when the concrete bound is 
known (currently off by default).
   -analyzer-config unroll-loops=true

   // Disables function inlining (you said you don't need IPA).
   -analyzer-config ipa=none

and also removing the artificial heuristics for loop unrolling (that 
attempt to discover whether the loop is statically bounded in 
LoopUnrolling.cpp). That would give you the most complete exploration 
the Static Analyzer could ever achieve. There may be more flags that i 
forgot about, but the above should be pretty good.

It is still impossible to achieve hardcore *verification* this way. The 
Analyzer will occasionally drop execution paths for many other reasons, 
and these reasons are fairly hard to enumerate. Like, it may encounter 
exotic language constructs that it still doesn't yet understand, or 
simply becomes too confused to continue, or it might turn out that an 
execution path looks infeasible to the Static Analyzer because of a bug 
but it may be taken in reality. At the end you will never be sure that 
the program is definitely correct.

But if you simply want to find "most" of the bugs, for a certain 
definition of "most", the above should do the trick.

On 1/15/20 9:17 PM, Fernandez, Matthew wrote:
>
> Thanks, Gabor. Sounds like this might be beyond CSA’s abilities. 
> Answers to your very apt questions below.
>
> /> How much control-flow awareness do you need? Do you really need 
> path-sensitivity or flow-sensitive is sufficient? Or maybe lexical 
> scoping is enough?/
>
> It seems to me currently that we need path-sensitivity. I’ve been 
> exploring some lexical approaches in parallel though, including an 
> RAII-style rephrasing of the code.
>
> /> Do you need interprocedural analysis? If so, do you have recursion? 
> Do you need context sensitivity? Can you add annotations to help guide 
> the analysis?/
>
> No IPA necessary and there’s no recursion. In most cases we can ignore 
> context, even to the extent of ignoring the content of a conditional 
> and just noting there’s a branch in control flow (with the exception 
> of when a branch condition depends on a lock acquisition result). 
> Annotations are possible, but if it comes to this I would probably 
> look at more drastic refactoring of the code. CSA would not be the 
> only thing consuming this code, so it would still need to be correct 
> and complete without the annotations.
>
> /> How complex is the task that you want to accomplish? Are locks 
> reentrant? Do you have to support more complex try_lock style APIs? Or 
> is it sufficient to only check the order of the API calls?/
>
> The locks are not re-entrant, though their /only/ APIs are 
> try_lock-style. So the analysis needs to comprehend whether lock 
> acquisition succeeded or failed.
>
> /> you could take a look at Thread Safety/
>
> Interesting, I was not aware of this. It looks like maybe I can make 
> this work for my purposes. Thanks for the pointer.
>
> *From:* Gábor Horváth <xazax at google.com>
> *Sent:* Wednesday, January 15, 2020 09:15
> *To:* Fernandez, Matthew <matthew.fernandez at intel.com>; Artem 
> Dergachev <noqnoqneo at gmail.com>
> *Cc:* cfe-dev at lists.llvm.org
> *Subject:* Re: [cfe-dev] exhaustiveness of CSA checkers
>
> (Adding Artem as he is very knowledgeable in this topic)
>
> Oh, I see. In case it is known that you have a bounded number of paths 
> it is not entirely unreasonable to use symbolic execution to achieve 
> what you want.
>
> Unfortunately, this is not a use-case that the static analyzer was 
> designed for. I think it should be possible to tweak it but I have no 
> idea how much work would that be.
>
> But even though it might be possible to tweak the analyzer I am not 
> sure if this would be the right thing to do. Some questions that might 
> help:
>
> 1. How much control-flow awareness do you need? Do you really need 
> path-sensitivity or flow-sensitive is sufficient? Or maybe lexical 
> scoping is enough?
>
> You only need path sensitive check if you want to avoid false 
> positives in the form of:
>
> if (cond)
>   lock();
>
> // ...
>
> if (cond)
>
>   unlock();
>
> It looks like you already have some constraints on the coding style in 
> the code you want to check. So I guess there is a chance that users 
> are not allowed to do locking using complex patterns like the one 
> above. If that is the case, flow-sensitive analysis might be a better 
> fit as it is easier to make that exhaustive and will perform much better.
>
> Or in case RAII style locking would be sufficient but you do not have 
> dtors in C, you can have syntactic checks that enforce hand-written 
> RAII style resource management.
>
> 2. Do you need interprocedural analysis? If so, do you have recursion? 
> Do you need context sensitivity? Can you add annotations to help guide 
> the analysis?
>
> 3. How complex is the task that you want to accomplish? Are locks 
> reentrant? Do you have to support more complex try_lock style APIs? Or 
> is it sufficient to only check the order of the API calls?
>
> In case you can add annotations and you do not need path sensitivity 
> you could take a look at Thread Safety Analysis: 
> https://clang.llvm.org/docs/ThreadSafetyAnalysis.html
>
> Cheers,
>
> Gabor
>
> On Wed, Jan 15, 2020 at 8:48 AM Fernandez, Matthew 
> <matthew.fernandez at intel.com <mailto:matthew.fernandez at intel.com>> wrote:
>
> Hi Gabor,
>
> Thanks for your reply. The checker I’m implementing is similar to 
> PthreadLockChecker. It knows the correct acquire/release patterns for 
> certain primitives and checks for them. If analysis fails to reach the 
> end of a function, the checker cannot warn for e.g. unreleased locks.
>
> This is a somewhat unorthodox case as I know the target code to which 
> this will be applied. All functions are <500LoC and the only loops are 
> statically bounded. It is observable statically that all functions 
> terminate and there are a finite number of paths.
>
> I was hoping to use CSA for this because it handles path enumeration 
> and constructing the exploded graph very nicely. Someone suggested to 
> me I might have to move to KLEE, but that would be a shame because I’d 
> need to introduce some code instrumentation/annotation to achieve what 
> I want. Another option would be to use an AST visitor to enumerate the 
> paths myself, but it would be nice to leverage LLVM’s existing 
> functionality for this.
>
> Thanks,
>
> Matthew
>
> *From:* Gábor Horváth <xazax at google.com <mailto:xazax at google.com>>
> *Sent:* Wednesday, January 15, 2020 08:13
> *To:* Fernandez, Matthew <matthew.fernandez at intel.com 
> <mailto:matthew.fernandez at intel.com>>
> *Cc:* cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> *Subject:* Re: [cfe-dev] exhaustiveness of CSA checkers
>
> Hi!
>
> The clang static analyzer does not give you any guarantees regarding 
> the coverage/exhaustiveness. There is no way to ensure exhaustive 
> analysis (such analysis is likely to be unbounded for most non-trivial 
> programs, so this is not only about runtime, but also termination). 
> For this reason all the checks have to be implemented with 
> non-exhaustiveness in mind.
>
> Could you share what you are trying to achieve? Maybe symbolic 
> execution is not the right tool for that problem.
>
> Cheers,
>
> Gabor
>
> On Wed, Jan 15, 2020 at 12:58 AM Fernandez, Matthew via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> Hello cfe-dev,
>
> In prototyping a custom checker for the Clang Static Analyzer, I’ve 
> found analysis terminates at some complexity limit. That is, when your 
> target function exceeds some complexity bound, CSA stops path 
> traversal and your checker does not receive callbacks for any 
> remaining unvisited nodes. The two specific scenarios where I’ve run 
> into this are high-iteration-count loops and complex conditionals 
> (multiple short circuiting && and || operators). The first I can work 
> around by rephrasing the target loops or something like 
> -analyzer-max-loop, but I can’t find a way to affect the behavior of 
> the second. To compound the situation, I cannot see how the checker 
> can detect that path exploration was incomplete.
>
> Is there a way to control the complexity limit enforced for 
> conditionals? Or, failing that, to detect within the checker when path 
> exploration was incomplete?
>
> To give some more context, my checker is an experiment and not 
> something I am intending to upstream. Runtime is not an issue; I am 
> fine with the analyzer taking multiple hours for a single run. Though 
> I understand why the existing CSA bound choices have been made, as 
> most users do not want their compiler to run for this long.
>
> Please CC me in replies as I’m not subscribed.
>
> Thanks,
>
> Matthew
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>