[cfe-dev] Blended analysis using Clang

Tue Oct 11 06:58:18 PDT 2016

> On Oct 11, 2016, at 5:15 AM, Eric Williamson via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Hello,
> 
> I am looking to conduct a blended analysis of c programs, with respect to certain invariants, using clang. I was wondering if there are been similar projects that have been previously completed.
> 
> As we have the source of these programs, my initial attempts have led me to using the clang static analyzer to statically assert as many of the invariants as I can using a path sensitive analysis, and identifying places to identify dynamic checks for certain passes where the invariant is unknown if violated. 

One thing to be aware of is that the clang static analyzer’s symbolic execution is not exhaustive. It can (and will) drop flows, it will stop unrolling loops, etc. and in general is under-approximate.

> The only way I have seen to identify these points of interest is to use a Rewriter instance to leave a marker comment at that source location where I want to insert dynamic checks. Is there a more standard way of communicating these locations for dynamic checks out from the static analyzer?

One  possibility is to insert a call to a function or builtin with a well-known name (for example “my_assert()”) and either link in an implementation of the function to be called at run time or write an LLVM pass to replace the call with instrumentation instructions. The analyzer itself doesn’t have any rewriting functionality.

> After those calls have been properly communicated I would like to use some method to embed checks into the final executable. From my understanding I can do this through a source to source translation with a Rewriter, with an AST modification, or at the IR level. Do you have any advice on what would be best to embed a variety of dynamic checks into a many different c programs?

In my experience source-to-source translation can be quite a lot of work — depending on your needs — because it often requires looking at surrounding context to insert instrumentation correctly. For example, suppose you have a function call that takes an expression with a side effect and you want your instrumentation to have access to that expression’s value. If you want to use that value in your instrumentation and also pass it to the called function you may need to store it first in a temporary. Correctly inserting a temporary local variable for a given expression at the AST level can require handling a lot of cases! Often it is easier to add the instrumentation at the IR level. The downside is that it may be more difficult or impossible to recover information that is obvious at the AST level but lowered away in LLVM IR.

Devin