[LLVMdev] OCaml bindings to LLVM

Mon Sep 8 12:17:01 PDT 2008

On 2008-09-05, at 23:26, Jon Harrop wrote:

> I'm having another play with LLVM using the OCaml bindings for a  
> forthcoming
> OCaml Journal article and I have a couple of remarks:
>
> Firstly, I noticed that the execute engine is very slow, taking  
> milliseconds to call a JIT compiled function. Is this an inherent  
> overhead or am I calling it incorrectly or is this something that  
> can be optimized in the OCaml bindings?

The high-level calling convention using GenericValue is going to be  
very slow relative to a native function call. This is true in C++, but  
even moreso in Ocaml, which must cons up a bunch of objects on the  
heap for each call. To get best performance, you would want to avoid  
fine-grained calls into JIT'd code, e.g. by iterating over inputs  
inside the JIT instead of outside.

If you want to improve performance of the GenericValue-based  
interface, I'd suggest trying to minimize the number and overhead of  
allocations in your Ocaml code, then look at the bindings themselves:

- If GenericValues can't be reused, add bindings to allow mutating  
them. Reuse the same 'n' instances for each call into JIT code. Yucky  
imperative data structures to the rescue.

- Write bindings for a heap-allocated GenericValue[] and wrap that in  
a custom block instead of heap-allocating each GenericValue  
individually. Of course such an array must be mutable. More imperative  
data structures!

- Try using placement new to initialize GenericValues inside of Ocaml  
blocks instead of new'ing them up on the C++ heap as is presently  
done. This would be outside the bounds of standard C++, so it could  
fail. This would require circumventing the C bindings, since such  
cannot expose the C++ GenericValue class as a struct.

- Use Ocaml variants for inputs (type GenericValue = Pointer of 'a |  
Int of bits * value | ...) and convert those to a stack-based  
SmallVector<GenericValue>. This will avoid finalizers on the Ocaml  
blocks. This doesn't work symmetrically for outputs, though. Likewise,  
it involves going around the C bindings.

But realize that a GenericValue-based interface will always be slow  
relative to a native call. If you have a specific performance goal  
though, you may be able to cheaply eliminate 'enough' overhead for  
your needs without much work. All of the above are relatively simple  
(should be doable in a day, modulo patch review).

For the very best performance, you really want to call the JIT'd  
function directly—e.g.,

     let nf = native_function name m

where native_function has type string -> Llvm.module -> 'a and nf has  
some functional type, like int -> int -> int.

However, this is subject to the quirks and complexities of the Ocaml  
FFI (e.g., overflow arguments passed in a global array on x86, totally  
nonstandard calling convention).

- If you know in advance the signature of the functions you're going  
to call, you can write shims in C (similar to those in llvm_ocaml.c)  
that will add not terribly much overhead. These wouldn't really be of  
any use to anyone else, though.

- If not, you can generate the shims at runtime using LLVM (even  
inline them into the callee), but will have to reimplement Ocaml's FFI  
macros for unwrapping values and tracking stack roots. This would take  
considerably more effort to implement (esp. portably), but would be a  
substantial improvement to the bindings if the helpers were  
incorporated therein.

> Secondly, I happened to notice that JIT compiled code executed on  
> the fly does not read from the stdin of the host OCaml program  
> although it can write to stdout. Is this a bug?

This has nothing to do with LLVM.

— Gordon