[LLVMdev] Fibonacci example in OCaml

Mon Nov 26 11:18:24 PST 2007

On Monday 26 November 2007 16:21, Gordon Henriksen wrote:
> Try 'Llvm_analysis.assert_valid_module m;' before you write bitcode to
> figure out where things went awry. ('dump_module m;' may also help.)
> GIGO applies (but garbage-in/segfault-out is more likely).

Ok, thanks for the tip.

> Unfortunately, even if the bindings were more strongly typed, it would
> still be structurally possible to build invalid LLVM code, so you've
> just got to take care not to violate the invariants, then use the
> verifier as a cross-check.

I suspect the OCaml bindings to LLVM could express more of these constraints 
in the static type system. Has anyone tried to leverage this?

I'd certainly like to do this for a higher-level representation. I have lots 
of ideas but I suspect that is OT for this list.

> Using an asserts build of LLVM is also helpful, but you should have
> this if you built from SVN.

I did, yes.

> > I assume I need a real type system that can handle both ints and
> > function pointers?
>
> Something like that.
>
> If you're fond of Ocaml's tagged object model (and it does have some
> nice properties), you'll want to slam every value to some generic
> value type and use lots of bitcasts. The value type is either the
> native integer (i32 or i64 depending on target) or a pointer type,
> probably the recursive pointer type:
>
>    let value_ty =
>      let temp_ty = opaque_type () in
>      let h = handle_to_type temp_ty in
>      refine_type temp_ty (pointer_type temp_ty);
>      type_of_handle h
>
> Since LLVM does not have an intptr type, you'd need to know your
> target and parameterize your codegen accordingly to use the integer
> types.
>
> The bitcasts have no runtime cost, and LLVM optimization passes can
> simplify them. Your compiler's semantic analysis phase can be
> responsible for proving type-safety, so it isn't strictly necessary to
> propagate type annotations on expressions into the codegen phase so
> long as the AST nodes themselves are not polymorphic.
>
> However, the potential exists to significantly reduce memory pressure
> (avoiding boxing) by using a typed object model instead of a tagged
> one. In this case, propagating types throughout would be necessary.
> This decision has very significant implications for the compilation of
> polymorphic functions.

So this is the part where I start asking really basic questions. :-)

As I understand it, the simplest approach is to box all values, which means 
that every value is stored as a struct containing the type (an enum) and the 
value itself (inline if it is no bigger than a pointer or as a pointer to a 
larger data structure otherwise). Something like this:

enum runtype {
  Int;
  Float;
  Array;
}

struct array {
  int length;
  box *a;
}

union value {
  int n;    // 64-bit
  float x;  // 64-bit
  array* u; // >64-bit
}

struct box {
  runtype t;
  value v;
}

So you'd create an int with:

box make_int(int n) {
  box b;
  b.t = Int;
  b.v.n = n;
  return b;
}

Is that right or are values stored as a box*?

I'd rather box everything rather than tag ints to start with. I'll relegate 
that to a potential optimization.

What of this can LLVM's optimizer optimize away for me?

So I have to work out how to generate IR that handles those data structures.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e