[cfe-dev] Dataflow analysis with LLVM/Clang

Wed Oct 1 13:19:20 PDT 2008

On Oct 1, 2008, at 11:47 AM, Mike Stump wrote:

> On Oct 1, 2008, at 6:52 AM, João Paulo Rechi Vita wrote:
>> I'm working on a MSc project and I need detect conflictive actions
>> between different threads in a program, through statical analysis.
>
> My take, if you want to use clang's Analysis engine (include/
> Analysis), you can't avoid using clang.  It isn't clear to me from
> your description if you need to use it however.  If all you want to do
> is insert code, and some some trivial analysis and llvm bitcode
> contains everything you need to do your work, then, you'd probably
> want to just stick with llvm.
>
> If you gave an example of the most complex reasoning you want to
> perform, that might help us tell you what part of llvm/clang can help
> the most.

I think Mike's comments are pretty much spot on.  To me this really  
amounts to listing out your requirements and what you are trying to  
accomplish.  From a high-level, it sounds like what you want to do is  
program transformation.  If the goal of the transformation is to  
change runtime behavior, then you can perform the transformation at  
either the LLVM IR level or by rewriting source code using Clang.  If  
the goal is to modify the original source code so that users now are  
working with an instrumented source file, then obviously this has to  
be done using Clang.

I'm going to assume that your goal is simply to modify runtime  
behavior.  If that is the case, my gut feeling is that it is better to  
do it at the LLVM IR level if you really don't require any specific  
knowledge about C.  The lowered representation of the LLVM IR  
marginalizes out details of the high-level language that may be really  
superfluous for your task; C is a "rich" language with many  
constructs, so your analysis would have to reason about many edge  
cases.  There are many other tradeoffs that we can go into if you are  
interested.

I think the other thing to keep in mind is how the concurrency  
primitives whose uses you are interested in monitoring are represented  
both in Clang's AST and the LLVM IR.  If you can easily identify when  
such primitives are used at the LLVM IR level, then doing your  
transformations there makes the most amount of sense to me (given the  
information I know about what you are trying to do).

I'm not 100% certain how you wanted to use line information.   
Certainly Clang has rich information about the locations of  
expressions within a source file, but LLVM IR can capture some  
debugging information that may be useful for constructing the line  
information you need (others can chime in here, since I'm not an  
expert on this topic).  It all depends on what you are trying to do.