[LLVMdev] LLVM and managed languages

Wed Jul 6 09:04:20 PDT 2011

On Fri, Jul 01, 2011 at 11:05:44AM -0700, Talin said
> So I've been using LLVM for about 4 years now, and I've posted a lot on this
> list about specific issues. What I would like to do is step back for a
> moment and give my "big picture" assessment of LLVM overall, particularly
> with respect to developing a "managed" language like Java / C# or my own
> language, Tart.

I'm working on an LLVM backend for the Scala compiler[1], so I'm very
interested in the issues you discuss here. Some of them I have
encountered myself already and others I can see on the horizon. I am new
to the list and offer apologies and request pointers if I cover ground
that's been trodden before.

[1] http://greedy.github.com/scala/

> I would also like to list the areas where I *don't need help* from LLVM -
> that is, these are things I can easily handle myself in the frontend:
> 
>    - Dynamic dispatch and virtual methods / multi-methods
>    - Boxing and unboxing of value types
>    - Reflection
>    - Memory management
> 
> I mention these items for two reasons: First, because these are services
> that *would* be provided by a JVM, and I want people to know that the lack
> of these items in LLVM does not in any way count as a detriment. And
> secondly, because I've occasionally heard people on this list ask for
> features like these, and I think it would be a waste of time to spend effort
> implementing something that the frontend can easily handle.

Agree 100%.

> The rest of my issues are more specific:
> 
> *Garbage collection is still way too difficult*. The biggest problem is the
> inability to track SSA values - it requires the frontend to generate very
> inefficient and error-prone code, manually spilling SSA values to the stack
> around nearly every function call. I've written proposals for improving the
> situation, which I won't repeat here - but realistically there's no way that
> I am ever going to have time learn to enough about LLVM's code generation
> passes to implement something like this myself.

I'd appreciate a pointer to the previous proposal and discussion. I
assume that you're talking about automatic spilling of SSA roots to
stack slots at safe points? Is it possible that you could attach
metadata to these SSA values and generate the appropriate LLVM IR to
spill all the values that dominate the safe point in the GC strategy?

> Another area which I'd like to see supported is stack walking - that is,
> starting from the current call frame, iterate through all of the call frames
> above it. Currently the only way to do this is via platform-specific
> assembly language - I'd think it would be relatively simple to make an
> intrinsic that does this. (Note that the current stack intrinsics are
> unusable, because they are (a) unreliable, and (b) inefficient. Taking an
> integer index of a stack frame as a parameter is not the right way to do
> it.)

Do you have something in mind similar to the interface provided by
libunwind (http://www.nongnu.org/libunwind/docs.html) but lighter weight
and integrated with LLVM?

> I have to say, if there's any single issue that could make me give up on
> LLVM and switch over to using something like a JVM, it would be this -
> because as far as I can tell, LLVM's garbage collection features are in
> exactly the same state they were in when I started working with LLVM four
> years ago.

In my mind one of the frustrations trying to use the GC support in LLVM
is the somewhat sketchy documentation. I'd be nice to have a
straightforward end-to-end example. Since you've implemented a GC using
this support maybe you could write something up. I know I'd appreciate
it a lot. Do you find that the status table in the GC document is still
accurate? It seems like the features you'd like to see implemented most
here are

 - Emitting code at safe points
 - Register maps
 - Liveness analysis (it appears that the new lifetime intrinsics may
   enable this)

> *Platform ABI limitations* - Currently LLVM requires the frontend developer
> to know quite a lot about the platform ABI - for example whether you are
> allowed to pass a struct of a certain size as a value type rather than as a
> reference. The thing is, experimental languages like mine generally don't
> care so much about ABI compatibility - sure, we'd like to be able to call C
> library functions once in a while (and we don't mind doing the extra work in
> those cases), but most of the time we just want to pass a data type around
> and expect it to work. Requiring the use of different techniques on
> different platforms makes the situation considerably more complex.

Is it reasonable to add a new calling convention that like fastcc
doesn't need to worry about conforming to any ABI but instead of trying
to go fast, provides maximum flexibility?

> *Light-weight coroutines* would be a "nice to have", as would better
> *concurrency
> primitives*. These are things I could do on my own, but it would be better,
> I think, to have them in LLVM - because in my view of the world, anything
> that requires lots of architecture-specific knowledge ideally belongs on the
> LLVM side of the line.
> 
> There's been a lot of discussion about divide-by-zero errors and other
> *non-declared
> exceptions*. Having this available would be a great help.

Agreed. I have read some of the proposals about this and it does indeed
seem tricky. At the least the instructions that could raise an exception
would have to be marked in some way so that side-affecting code isn't
moved past them.

> *Named structure types* - as per Chris's proposal - would be a major
> simplification, as it would allow declaration of pointer fields without
> having to import the code that defines the structure of the thing that the
> pointer is pointing to.

I would really like to see this. It would also help when inspecting IR.
Right now if types have been uniqued all structures with the same layout
get collapsed to a single name which makes things awkward to read.

-- Geoff