[LLVMdev] Integrating LLVM in an existing project

Thu Apr 5 05:10:52 PDT 2007

Hi everyone,

After some time hacking on llvm, let me introduce myself :)
I'm a PhD student at the French university Pierre et Marie Curie in 
Paris. I work on a project
called the "Virtual Virtual Machine" project. You can find some (dated) 
information on the
website http://vvm.lip6.fr.

Basically it's a "low level virtual machine" :) with a just in time 
compiler called the VPU, an execution environment
(GC + threads) and it parses a lisp-like language that is translated to 
the VPU's internal bytecode.

On top of this execution environment we have implemented a java virtual 
machine and a .net virtual machine. They
are both functional and we achieve reasonable performance (1/3 of IBM's 
JVM or Mono).

Our just in time compiler is however what we think the main limit for 
having better performance. Our register allocator
is really simple and we don't have any basic optimization passes. So we 
decided to take a look at llvm and see if it was
possible to translate the VPU's internal bytecode to llvm's bytecode. 
After porting llvm to linux/ppc, and adding some
functionality in llvm (like knowing the required size for the code of a 
method before allocating memory for the method), we can now execute
a large amount of code of our lisp-like language.

So the next step was to execute our Java and .Net virtual machine on top 
of the new execution environment with llvm. They are both
implemented in the lisp-like language, therefore we expected nothing or 
at least little changes. Which was the case. However when executing Java 
or .Net applications we turned into the problem of exception handling.

Exception handling is not integrated in our execution environment. The 
JIT is not aware of exception handling. Therefore they are handled at 
the application level with setjmp and longjmp. When a method with 
exceptions is compiled, we set labels (start, end, handler) in the 
compiler for each exception and after compilation grab the address of 
these labels in the generated code. When an exception is thrown, we 
compare the current instruction pointer with all couples (start, end) of 
the current method. If the IP is in the interval, and if the exception 
type is correct, we setjmp to an instruction in the method's code which 
will jump to the handler. If not, we look at the calling method's 
exceptions, and so on until we reach the end of the backtrace.

This algorithm does not work with llvm because creating labels (which 
correspond to creating basic blocks) does not imply that the label (ie 
the basic block) will have an address. Even without optimizations 
enabled in llvm, some basicblocks are not emitted (obviously because 
some basic blocks are useless).

We can not use current exception handling in llvm, see 
http://www.nondot.org/sabre/LLVMNotes/ExceptionHandlingChanges.txt.
We can not use the llvm.dbg.stoppoint feature because it is not 
implemented in llvm's JIT.

So we are stuck :). However we would really like to see what performance 
gains we have with llvm.

So here are a few questions whose answers will help me go through this issue
1) Is the Chris' exception handling note actually implemented or is it 
still in project? And how difficult do you expect it to be? (Even if I 
have implemented some stuff in llvm, I am still not entirely comfortable 
with the code)
2) The llvm.dbg.stoppoint: how far is it actually implemented?
3) Getting the address of basic blocks: is there a workaround?

Thanks a lot for your answers. And don't hesitate to ask me more infos 
if things aren't clear in my explanations.

Best,
Nicolas