[cfe-dev] clang vs llvm for verifying code

Nuno Lopes nunoplopes at sapo.pt
Mon May 11 15:38:06 PDT 2009


Hi,

Thanks Daniel for sharing your thoughts on this. I really appreciate it.
My initial intuition seem to agree with your points. Now I just need to 
convince my advisor :) (your e-mail will certainly help)

Thanks,
Nuno

P.S.: about ocaml, well I've only written 2 lines of it last week (to fix a 
bug in a CIL plugin) and, well, it looks "different" at first sight, at 
least.


----- Original Message ----- 
From: "Daniel Dunbar" <daniel at zuster.org>
To: "Nuno Lopes" <nunoplopes at sapo.pt>
Cc: <cfe-dev at cs.uiuc.edu>
Sent: Monday, May 11, 2009 5:45 PM
Subject: Re: [cfe-dev] clang vs llvm for verifying code


Hi Nuno,

I'll throw my 2 cents in since I have some experience in related areas
(I have working on source checking tools on top of both LLVM IR and
CIL).

My main recommendation would be to go with LLVM IR, unless you really
feel there are some semantics of the source language which are not
accurately captured in the IR. Especially for a "verification" tool,
it is very important that the core implementation be correct and this
is much easier to verify if the language you accept is LLVM IR. But,
of course, I'm biased... :)

Some other comments interleaved:

On Sat, May 9, 2009 at 4:42 AM, Nuno Lopes <nunoplopes at sapo.pt> wrote:
> Hi,
>
> Next week I'm going to start to write a tool to verify the absence of 
> buffer
> overflows in C code. I already wrote the backend (a model checker), but 
> now
> I need a frontend to dump the states+transitions info.
> My question is the following: should I dump this info directly from 
> clang's
> AST or from LLVM bitcode? From what I understand, both have their 
> advantages
> and disadvantages:
>
> clang:
> + we're close to the source. it's easy to pinpoint the bug down to the 
> line
> number

If line number information is good enough for you, you can get this as
well from the debug info in LLVM IR. Klee does this and it works quite
well, and probably works better now that the optimization passes are
friendlier to debug info.

> + simple constant folding

With IR you can of course use the whole set of IR optimization passes.

> - AFAIK, merging the separate files' AST into a single, whole-program AST
> is not there yet. I may be able to merge the info later in the chain,
> though.
> - no (inter-procedural) alias analysis

A big '+' here is the ability to use information that is present in
the language but not the LLVM IR. For example, there are things in C
which are undefined behavior, but are well defined once they have been
compiled. If your analysis cares about this distinction, then you need
the C input.

> llvm:
> + easy merging of the whole program into a single bitcode file
> + inter-procedural alias analysis. especially necessary for figuring out
> the callees of indirect function calls
> + possibility to run cheap optimizations to reduce the code size
> + the semantics of the C language are already lowered to simpler
> instructions
> - not clear to me if it's possible to accurately pinpoint the location of
> an error to the source program (maybe take advantage of the debug info?)

Another + you didn't mention is the capability of taking inputs which
are not written in C. Also, for debugging and testing you can test .ll
files directly, which may be simpler or easier to debug than their
corresponding C inputs.

> CIL:
> + seems to be used by many in the verification community

Yes, although as far as I know it hasn't been under active development
for a while, although it does still get maintained. However, in
general it works very well for C code. One of the additional '+'s of
CIL is that it lowers many annoying constructs in C so that your
transformation/analysis gets to deal with a much more regular
language.

> + easy merging of the whole program AST
> + many different alias analysis algorithms
> - no C++ support
> - written in ocaml... bah!

This is a '+'. Ocaml is a language that is definitely worth learning
if you haven't used it, and doing source transformation with CIL is
quite elegant. That said, CIL does have a number of limitations and
these days I think it is easier to build a robust system on either
LLVM or Clang.

 - Daniel

> Does anyone has any other point to add? Any recommendation?
>
> Thanks,
> Nuno 




More information about the cfe-dev mailing list