[LLVMdev] Instrumenting C/C++ programs

John Criswell criswell at illinois.edu
Fri Sep 23 10:43:13 PDT 2011

On 9/23/11 12:24 PM, Himanshu Shekhar wrote:
> I just  read that LLVM project could be used to do static analysis on 
> C/C++ codes using the analyzer Clang which the front end of LLVM. I 
> wanted to know if it is possible to extract all the accesses to 
> memory(variables, local as well as global) in the source code using LLVM.

When doing analysis with Clang and LLVM, you first must make a choice 
about which IR to use: Clang's Abstract Syntax Tree (AST) or LLVM's SSA 
Intermediate Representation (IR).  Clang takes source code and converts 
it into an AST; it later takes the AST and converts it to LLVM IR.  LLVM 
then performs mid-level compiler analysis and optimization on code in 
LLVM IR form and then translates from LLVM IR to native code.

Clang ASTs will give you much higher level information than LLVM IR.  On 
the other hand, LLVM IR is probably easier to work with and is 
programming language agnostic.

You might want to read about the LLVM Language Reference Manual 
(http://llvm.org/docs/LangRef.html) to get a feel of whether it is 
suitable for your analysis.  There may be a similar document for Clang, 
but I'm not familiar with it since I haven't worked with Clang ASTs myself.

> Is there any inbuilt library present in LLVM which I could use to 
> extract this information. If not please suggest me how to write 
> functions to do the same.(existing source code, reference, tutorial, 
> example...)

It is easy to write an LLVM pass that plugs into the opt tool that 
searches for explicit accesses to memory.  The LLVM load and store 
instructions access memory (similar to how loads and stores are used to 
access memory in a RISC instruction set).  That said, it is not clear 
whether this is what you want to do.  Some source-level variables are 
translated into one or more SSA virtual registers, so you'll never see a 
load or store to them (as they may never exist in memory but only in 
registers).  Additionally, some loads and stores to memory are not 
visible at the LLVM IR level.  For example, loads and stores to stack 
spill slots are not visible at the LLVM IR level because they're only 
created during code generation (and technically, they're generated in a 
third IR called Machine Instructions that is used specifically for code 

> Of what i studied is, I need to first convert the source code into 
> LLVM IR and then make an instrumenting pass which would go over this 
> bitcode file and insert calls to do the analysis, but don't know 
> exactly how to do it.

The first thing you need to do is figure out which representation of the 
program (Clang ASTs, LLVM IR, LLVM's code generation IR) is the best for 
solving your particular problem.  If you want, you can provide more 
details on what you're trying to do; people on the list can then provide 
feedback on which representation is most suitable for what you want to do.

If you decide to work with LLVM IR, I then recommend reading the "How to 
Write an LLVM Pass" document 
(http://llvm.org/docs/WritingAnLLVMPass.html) as well as the 
Programmer's Guide (http://llvm.org/docs/ProgrammersManual.html).  
Doxygen is also valuable (http://llvm.org/doxygen/).

For an example of a pass that adds run-time checks to LLVM IR loads and 
stores, look at SAFECode's load/store instrumentation pass 
It's about as simple as an instrumentation pass gets.

-- John T.
> Please suggest me how to go about it .
> thanks
> himanshu
> -- 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110923/ffdd9130/attachment.html>

More information about the llvm-dev mailing list