[cfe-dev] clang vs llvm for verifying code

Mon May 11 09:45:49 PDT 2009

Hi Nuno,

I'll throw my 2 cents in since I have some experience in related areas
(I have working on source checking tools on top of both LLVM IR and
CIL).

My main recommendation would be to go with LLVM IR, unless you really
feel there are some semantics of the source language which are not
accurately captured in the IR. Especially for a "verification" tool,
it is very important that the core implementation be correct and this
is much easier to verify if the language you accept is LLVM IR. But,
of course, I'm biased... :)

Some other comments interleaved:

On Sat, May 9, 2009 at 4:42 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
> Hi,
>
> Next week I'm going to start to write a tool to verify the absence of buffer
> overflows in C code. I already wrote the backend (a model checker), but now
> I need a frontend to dump the states+transitions info.
> My question is the following: should I dump this info directly from clang's
> AST or from LLVM bitcode? From what I understand, both have their advantages
> and disadvantages:
>
> clang:
>  + we're close to the source. it's easy to pinpoint the bug down to the line
> number

If line number information is good enough for you, you can get this as
well from the debug info in LLVM IR. Klee does this and it works quite
well, and probably works better now that the optimization passes are
friendlier to debug info.

>  + simple constant folding

With IR you can of course use the whole set of IR optimization passes.

>  - AFAIK, merging the separate files' AST into a single, whole-program AST
> is not there yet. I may be able to merge the info later in the chain,
> though.
>  - no (inter-procedural) alias analysis

A big '+' here is the ability to use information that is present in
the language but not the LLVM IR. For example, there are things in C
which are undefined behavior, but are well defined once they have been
compiled. If your analysis cares about this distinction, then you need
the C input.

> llvm:
>  + easy merging of the whole program into a single bitcode file
>  + inter-procedural alias analysis. especially necessary for figuring out
> the callees of indirect function calls
>  + possibility to run cheap optimizations to reduce the code size
>  + the semantics of the C language are already lowered to simpler
> instructions
>  - not clear to me if it's possible to accurately pinpoint the location of
> an error to the source program (maybe take advantage of the debug info?)

Another + you didn't mention is the capability of taking inputs which
are not written in C. Also, for debugging and testing you can test .ll
files directly, which may be simpler or easier to debug than their
corresponding C inputs.

> CIL:
>  + seems to be used by many in the verification community

Yes, although as far as I know it hasn't been under active development
for a while, although it does still get maintained. However, in
general it works very well for C code. One of the additional '+'s of
CIL is that it lowers many annoying constructs in C so that your
transformation/analysis gets to deal with a much more regular
language.

>  + easy merging of the whole program AST
>  + many different alias analysis algorithms
>  - no C++ support
>  - written in ocaml... bah!

This is a '+'. Ocaml is a language that is definitely worth learning
if you haven't used it, and doing source transformation with CIL is
quite elegant. That said, CIL does have a number of limitations and
these days I think it is easier to build a robust system on either
LLVM or Clang.

 - Daniel

> Does anyone has any other point to add? Any recommendation?
>
> Thanks,
> Nuno
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>