[LLVMdev] LLVM for dynamic languages

Sun Sep 14 14:22:04 PDT 2003

On Fri, 12 Sep 2003, Rayiner Hashem wrote:
> > That said, many dynamic languages should be cleanly mappable to the LLVM
> > layer. What are you thinking about in particular?
> Not thinking of one in particular right now, but it would probably something
> like an object-oriented Scheme.

Ok.

> > recommend writing some language tuned optimizations (as necessary) that
> > would lower the dynamic objects into the primitives the optimizers
> > already understand.

> Yep. I'm assuming that a language-specific optimizer would handle things like
> type inference, so the llvm code would either see plain primitives or a
> dynamic representation:

> struct object {
> 	int typecode;
> };
>
> struct integer {
> 	int typecode;
> 	int value;
> };

Exposing a dynamic representation like this is a good idea.  In fact, if
you do so like this:

struct integer {
  struct object object_base;
  int value;
}

There won't be as much casting in the LLVM code, and a lot of the
optimizers will do good things for you automatically (like limited type
inference).

> Thus, the calling code could manipulate integers with pointers to objects.

Yup, or a variety of other potential "subclasses".

> > If you can express the method dispatch in a way that is useful to the
> > preexisting optimizations, of course, you wouldn't have to do anything
> > special to support this (C++/Java v-tables fit into the catagory).

> As far as I can see, casting needs to be done only once. All the calling code
> knows is that it has a pointer to an object. After using the typecode to
> lookup the method to call, it calls a given function, passing it the object
> parameter. This function knows the precise llvm type of the object, so it
> needs to cast the object* to (say) an integer*.

Yup.  In general, up-casts don't require pointer arithmetic (if the
"classes" are structured like above), only down-casts do.

> Come to think of it, C++/Java code needs to do the same thing, though
> the dispatch mechanism is different.  Since llvm doesn't appear to have
> any notion of inheritence, C++/Java virtual methods should need to cast
> their 'this' pointer to the precise type.

Yes, in exactly the same way.

> I presume this doesn't interfere too much with the other optimizations?

Not so far.  I have toyed with the idea of adding a "safe_cast" operation
or something, which the language front-end guarantees to the optimizer
that the resultant type is correct.  For something like a dynamic_cast in
C++ and virtual method dispatch in many languages, it would be very
useful.  e.g.:

Derived *X = dynamic_cast<Derived*>(base);

would become LLVM code:

  %cond = bool isInstanceOfDerived(base* %base)
   br bool %cond, label %Yes, label %Cont
Yes:
  ; This is guaranteed to be correct by checks the front-end does
  %Casted = safe_cast base* %base to %Derived*
  br label %Cont
Cont:
  %X = phi [%Casted], [null]

This is something that I have thought about adding in the future, but it
is not important enough to our short term goals to add.  In particular,
there is a variety of other types of "front-end hints" which can be passed
into LLVM, which would be useful to represent in a structured way.  Here
are some ideas:

1. Casts that are known correct
2. Functions that never "throw", return, access memory, ...
3. Aliasing properties guaranteed by the language (e.g., parameters in
   Fortran cannot alias, tbaa in C, mutable and immutable objects cannot
   alias in ML, ...)

It's very important that we don't pollute LLVM with a bunch of ad-hoc
wierd stuff though, so we have focused on the core issues, such as the
design, stability, and basic implementation.  In the future, I expect that
some (small, well thought out) extensions can be used to address these
issues...

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/