[LLVMdev] improving the ocaml binding's type safety

Sun Mar 16 00:21:05 PDT 2008

On Sat, Mar 15, 2008 at 7:33 PM, Gordon Henriksen
<gordonhenriksen at mac.com> wrote:
> After some experimentation, I'd prefer the closed system. LLVM has some type
> peculiarities like the commonality between CallInst and InvokeInst. I find
> that the closed type system lets me express such constraints more naturally.
> Expressing these constraints explicitly in the open system involves
> annotating the C++ class hierarchy with extra variants which are unnecessary
> in the closed model.

It looks like you might be right, and open variants might not be able
to handle the pseudo-shared functions like
llvm::CallInst::doesNotReturn and llvm::InvokeInst::doesNotReturn. We
can't do the naive

val does_not_return : [> `CallInst | `InvokeInst] t -> bool

Like we can with closed variants:

val does_not_return : [< llcallinst | llinvokeinst] t -> bool.

Although... what if we just add another variant? This would work:

type llcallinst = [ llinstruction | `CallInst | `CallSite ]
type llinvokeinst = [ llinstruction | `InvokeInst | `CallSite ]
val does_not_return : [> `CallSite] t -> bool

It makes the variant types a little more complicated, but end users
wouldn't work directly with the variants so there might not be that
much added complexity. They'd just specify "llcallinst t" and the
like. The variants would pretty much only get used in llvm.ml.

What are other problems that I'm missing? Have any other ideas on when
adding yet another variant really would break things down? I have to
think about this more. I just don't understand polymorphic types very
well.

> Please use 'a Llvm.ty for Type and 'a Llvm.v for Value to save typing. These
> choices avoid conflicting with the common type binding t and the language
> keyword val, but promote these important types to the type names into the
> Llvm module (likely open'd) for brevity's sake.
>
> I don't have a better suggestion than just naming the variant sum types
> Llvm.ll_____. I considered some other options, but decided I'm not fond of
> them in practice.

I think we'd need only one definition of "type 'a whatever". The
phantom type would be enough to distinguish everything. For instance,
lablgtk has a "type 'a obj" that they use as the base type for all of
their variants, which I might copy. We can even hide this type by
naming the variants something like "llfunction_variants" and then
"type llfunction = llfunction_variants obj" to have roughly the same
interface as before.

What about putting the types (and functions) in modules, like
ModuleProvider? It'd be like my scheme to break up llvm.ml into
multiple files without actually splitting it up :) Then you could open
Llvm and reference Value.t without obscuring anything. If not, then
maybe we should unpack ModuleProvider from a module (which I'd prefer
not to do). I was unsure of if it were better to add new functionality
in the top level or in a module, so I erred towards module. I'd also
like to be able to call "Module.create" instead of "create_module". We
could even open or abbreviate the module name in scope for even
shorter function names than before.

Oh and one last thing that I've been meaning to ask for a long long
time. Do you think we could change the ordering of some of the
arguments? The builder functions, like:

external build_phi : (llvalue * llbasicblock) list -> string ->
llbuilder -> llvalue = "llvm_build_phi"

Have the builder as the last argument, instead of the first as it
normally is done in ocaml libraries. It also hampers currying, but I'm
not sure how often that would be used. The downside is that we'd have
to translate the order in llvm_ocaml.c, but since more of the
functions I want to do this to already have bindings in there, we
wouldn't really have any extra overhead.