[LLVMdev] Fibonacci example in OCaml

Mon Nov 26 08:21:22 PST 2007

On Nov 26, 2007, at 00:47, Jon Harrop wrote:

> Here is a complete 104-line native code compiler for a tiny subset  
> of OCaml that is expressive enough to compile an external Fibonacci  
> program:
>
> [...]
>
> I was kind of hoping that function pointers would just magically  
> work, so this:
>
>  do (if 1 <= 2 then fib else fib) 40
>
> would run, but instead it produces this error:
>
> $ llc -f run.bc -o run.s
> llc: bitcode didn't read correctly.
> Reason: Invalid instruction with no BB

Try 'Llvm_analysis.assert_valid_module m;' before you write bitcode to  
figure out where things went awry. ('dump_module m;' may also help.)  
GIGO applies (but garbage-in/segfault-out is more likely).

Unfortunately, even if the bindings were more strongly typed, it would  
still be structurally possible to build invalid LLVM code, so you've  
just got to take care not to violate the invariants, then use the  
verifier as a cross-check.

Using an asserts build of LLVM is also helpful, but you should have  
this if you built from SVN.

> I assume I need a real type system that can handle both ints and  
> function pointers?

Something like that.

If you're fond of Ocaml's tagged object model (and it does have some  
nice properties), you'll want to slam every value to some generic  
value type and use lots of bitcasts. The value type is either the  
native integer (i32 or i64 depending on target) or a pointer type,  
probably the recursive pointer type:

   let value_ty =
     let temp_ty = opaque_type () in
     let h = handle_to_type temp_ty in
     refine_type temp_ty (pointer_type temp_ty);
     type_of_handle h

Since LLVM does not have an intptr type, you'd need to know your  
target and parameterize your codegen accordingly to use the integer  
types.

The bitcasts have no runtime cost, and LLVM optimization passes can  
simplify them. Your compiler's semantic analysis phase can be  
responsible for proving type-safety, so it isn't strictly necessary to  
propagate type annotations on expressions into the codegen phase so  
long as the AST nodes themselves are not polymorphic.

However, the potential exists to significantly reduce memory pressure  
(avoiding boxing) by using a typed object model instead of a tagged  
one. In this case, propagating types throughout would be necessary.  
This decision has very significant implications for the compilation of  
polymorphic functions.

> After that, I'll add compound types and a memory leak. ;-)

Indeed.

— Gordon