[LLVMdev] OCaml

Jon Harrop jon at ffconsultancy.com
Sun Nov 25 08:49:33 PST 2007


On Sunday 25 November 2007 12:23, Gordon Henriksen wrote:
> On 2007-11-24, at 21:58, Jon Harrop wrote:
> > - Garbage collection tuned for functional programming
>
> http://llvm.org/docs/GarbageCollection.html
>
> I've been doing some interesting work on this front. Getting Lattner-
> cycles to have it reviewed and integrated is probably the biggest
> challenge; LLVM is a joy to work with even on major surgery like
> this. :)
>
> The goal is that LLVM should be able to take care of the code
> generation aspects of GC while leaving the runtime open-ended. It
> would be nice to provide a GC runtime along with LLVM, but I'm not
> entirely certain how realistic that is given how intertwined GC is
> with the object model. All of this is thoroughly treated in the above
> doc.
>
> Can you elaborate on what tuning you're looking for?

I'll give you a bit of background info:

I'm actually a natural scientist rather than a computer scientist and I am 
looking for a next-generation technical computing platform for Linux and Mac 
OS X that is open source but commerce friendly and provides features that can 
compete with the likes of Microsoft's new language F#.

I am more than willing to knuckle down on the project myself provided it will 
give me a platform that I can sell libraries for but I have no expertise in 
this field so I need all the help that projects like LLVM can give me! :-)

I have been working professionally with the OCaml language for several years 
now and find it to be enormously productive for two main reasons:

. Expressive: like ML
. Fast: like ML for symbolic code and like C/C++ for numeric code

This marriage of features allows OCaml to carve out a huge niche in scientific 
computing between languages like Mathematica and C++. Consequently, OCaml has 
garnered a lot of interest from the scientific community.

Although F# is similarly expressive and fast for numeric code it is slow for 
symbolic code because its run-time is inherited from .NET and is tuned for 
C#. In ordinary imperative languages like C#, values are rarely allocated and 
deallocated rapidly. However, in functional languages like F#, the 
distribution of value lifetimes is heavily geared toward a huge rate of 
allocation of very short-lived objects. Consequently, idiomatic functional 
code often runs up to 5x slower with F# than with OCaml because the .NET 
run-time is not tuned for this.

Now, the .NET platform obviously provides a great starting point for 
implementing languages like OCaml and is arguably more suitable than LLVM 
because it is higher-level. However, every problem is an opportunity. In this 
case, I believe LLVM would make it much easier to use a GC tuned for 
functional programming languages and, consequently, I think it will be quite 
feasible to get performance between that of OCaml and F# without too much 
difficulty.

> > - Exceptions
>
> http://llvm.org/docs/ExceptionHandling.html
>
> LLVM's exception support is tuned toward DWARF "zero-cost exceptions,"
> i.e. C++ exception handling. Anton Korobeynikov and Duncan Sands (who
> is working on Ada) are probably the experts in this area.

Excellent. There is one thing that confuses me about this though. I 
benchmarked exception handling in OCaml and C++ a while ago and found OCaml 
to be ~6x faster and the best explanation I got was that C++ does not have 
zero-cost exceptions because it requires destructors to be called as the 
stack is unwound, whereas OCaml can just jump back and leave collection to 
the GC.

So does zero-cost exception handling in C++ refer to a special case where you 
can statically prove that there are no destructors to call, or something?

> > - Some interface to LLVM from OCaml
> >
> > What is the easiest way to interface a front-end written in OCaml
> > with an LLVM backend?
>
> The C and Ocaml bindings in the source tree are intended to cover
> precisely this scenario, and I would recommend them over .ll emission.
> Jan's remark is a bit out of date; the bindings are sufficient for
> code generation now. A few corners of the IR are still not fully
> covered, but extending the bindings to new methods is quite
> straightforward.

Fantastic.

> If ocamlc is on your path, then 'configure; make; make install' should
> install the bindings in your ocaml lib. To link with them, compile
> your program with:
>
>      ocamlopt -cc g++
>
> The LLVM libraries currently bound are:
>
>      llvm.cmxa / .cma
>      llvm_bitwriter.cmxa / .cma
>      llvm_analysis.cmxa / .cma
>
> Their .mli files and the corresponding llvm-c headers (coupled with an
> understanding of the C++ API) are presently the best reference.

Right. I hadn't noticed they were already installed after llvm "make install" 
in:

  /usr/local/lib/ocaml/

> > I just rediscovered the OCaml bindings in bindings/ocaml (...). They
> > do indeed look quite complete but I can't find any examples using
> > them.
>
> See an example here, which an Ocaml program to emit the bitcode for a
> "hello world" program:
>
>      http://lists.cs.uiuc.edu/pipermail/llvmdev/2007-October/010996.html
>
> > I think a translation of the tutorial would be most welcome and
> > about 10x shorter. ;-)
>
> Ah, maybe. Patches are welcome. :)

Wow, this is just great!

I had to tweak your example to get it to compile. Some of the function names 
and signatures have changed (I'm using CVS LLVM) so I've updated them and 
just thrown away the booleans you were passing (no idea what they were for 
but it works ;-). Also, I think const_string maybe should null terminate the 
given string so I changed your example to pass it a null terminated string 
instead (nasty hack).

My code is:

open Printf
open Llvm

let main filename =
   let m = create_module filename in

   (* @greeting = global [14 x i8] c"Hello, world!\00" *)
   let greeting =
     define_global "greeting" (const_string "Hello, world!\000") m in

   (* declare i32 @puts(i8* ) *)
   let puts =
     declare_function "puts"
       (function_type i32_type [|pointer_type i8_type|]) m in
   
   (* define i32 @main() { entry: *)
   let main = define_function "main" (function_type i32_type [| |]) m in
   let at_entry = builder_at_end (entry_block main) in

   (* %tmp = getelementptr [14 x i8]* @greeting, i32 0, i32 0 *)
   let zero = const_int i32_type 0 in
   let str = build_gep greeting [| zero; zero |] "tmp" at_entry in

   (* call i32 @puts( i8* %tmp ) *)
   ignore (build_call puts [| str |] "" at_entry);

   (* ret void *)
   ignore (build_ret (const_null i32_type) at_entry);

   (* write the module to a file *)
   if not (Llvm_bitwriter.write_bitcode_file m filename) then exit 1;
   dispose_module m

let () = match Sys.argv with
  | [|_; filename|] -> main filename
  | _ -> main "a.out"

To use it I just do:

$ ocamlopt -dtypes -cc g++ -I /usr/local/lib/ocaml/ llvm.cmxa 
llvm_bitwriter.cmxa hellow.ml -o hellow
$ ./hellow run.bc
$ llc -f -march=c run.bc -o run.c
$ gcc run.c -o run
run.c:114: warning: conflicting types for built-in function ‘malloc’
run.c: In function ‘main’:
run.c:143: warning: return type of ‘main’ is not ‘int’
$ ./run
Hello, world!

How do I compile straight to native code without going via C? Can we use pipes 
to avoid generating intermediate files?

> > the OCaml bindings in bindings/ocaml (rather than the ones in test/
> > Bindings/OCaml!).
>
> The latter directory contains tests of the former!

Ah, I see. :-)

Shall we port the tutorial to OCaml?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e




More information about the llvm-dev mailing list