[LLVMdev] summer of code idea — checking bounds overflow bugs

Tue Mar 30 21:07:09 PDT 2010

罗勇刚(Yonggang Luo) wrote:
> Sounds an good idea, is that means lowerinng down the SAFECode project
> from the higher level(clang)to lower level for an more general work on
> bound check?

SAFECode has always worked on the LLVM IR.

What I am saying is that my preference is to have LLVM passes that do 
static array bounds checking instead of Clang passes that do static 
array bounds checking.  The problem that I see with implementing static 
array bounds checking in Clang is that it benefits only languages 
utilizing Clang's libraries.  That means that VMKit, llvm-gcc/g++, and 
other potential frontends can't benefit from it.  SAFECode won't derive 
any benefit except when it is used in conjunction with Clang.  That's 
okay but not ideal.

Also, SAFECode, being a set of LLVM passes, uses LLVM passes better than 
Clang passes.  If static array bounds checking were implemented in 
Clang, then a Clang-based transform would need to insert information 
into the LLVM IR to communicate to SAFECode which GEP instructions 
stayed within bounds.  If static array bounds checking is implemented as 
an LLVM pass, then SAFECode will just need to add it as a prerequisite 
and query the results.

Now, having said that, static array bounds checking in Clang is probably 
a very good thing for the Clang static analyzer, and having strong 
static analysis tools for finding bugs is a good thing, so if anyone 
wants to build static array bounds checking for Clang, go for it.  
However, I can't mentor such a project (I have no experience with Clang 
analyses), and it won't benefit my project (SAFECode) very easily.

>  I aslo want to know is it possoble to detecting memory
> leak at the very low(llvm ir) level to detecting memory leaks?

I don't see why not.  I believe Valgrind does it on assembly code; you 
could probably build an LLVM transform that does what Valgrind does but 
does it more efficiently (primarily because using LLVM as a static 
compiler removes the dynamic binary translation overhead).

>  Or at
> llvm ir  level to providing an stackfull hooks? It's very useful to
> have such an feature. The stack hooks can help us to print extra stack
> info in the exec period without modify the original code, to help us
> to find bugs easier:)
>   

I'm not sure what you mean here.  Can you clarify?

-- John T.

P.S. I use the term "Clang IR" to mean whatever data structures Clang 
uses to represent code.  I believe it uses Abstract Syntax Trees 
(ASTs).  Perhaps I should have said ASTs...