[LLVMdev] LLVM and managed languages

Wed Jul 6 10:43:16 PDT 2011

On Wed, Jul 6, 2011 at 9:04 AM, Geoff Reedy <geoff at programmer-monk.net>wrote:

> On Fri, Jul 01, 2011 at 11:05:44AM -0700, Talin said
> > So I've been using LLVM for about 4 years now, and I've posted a lot on
> this
> > list about specific issues. What I would like to do is step back for a
> > moment and give my "big picture" assessment of LLVM overall, particularly
> > with respect to developing a "managed" language like Java / C# or my own
> > language, Tart.
>
> I'm working on an LLVM backend for the Scala compiler[1], so I'm very
> interested in the issues you discuss here. Some of them I have
> encountered myself already and others I can see on the horizon. I am new
> to the list and offer apologies and request pointers if I cover ground
> that's been trodden before.
>
> [1] http://greedy.github.com/scala/
>
> > I would also like to list the areas where I *don't need help* from LLVM -
> > that is, these are things I can easily handle myself in the frontend:
> >
> >    - Dynamic dispatch and virtual methods / multi-methods
> >    - Boxing and unboxing of value types
> >    - Reflection
> >    - Memory management
> >
> > I mention these items for two reasons: First, because these are services
> > that *would* be provided by a JVM, and I want people to know that the
> lack
> > of these items in LLVM does not in any way count as a detriment. And
> > secondly, because I've occasionally heard people on this list ask for
> > features like these, and I think it would be a waste of time to spend
> effort
> > implementing something that the frontend can easily handle.
>
> Agree 100%.
>
> > The rest of my issues are more specific:
> >
> > *Garbage collection is still way too difficult*. The biggest problem is
> the
> > inability to track SSA values - it requires the frontend to generate very
> > inefficient and error-prone code, manually spilling SSA values to the
> stack
> > around nearly every function call. I've written proposals for improving
> the
> > situation, which I won't repeat here - but realistically there's no way
> that
> > I am ever going to have time learn to enough about LLVM's code generation
> > passes to implement something like this myself.
>
> I'd appreciate a pointer to the previous proposal and discussion. I
> assume that you're talking about automatic spilling of SSA roots to
> stack slots at safe points? Is it possible that you could attach
> metadata to these SSA values and generate the appropriate LLVM IR to
> spill all the values that dominate the safe point in the GC strategy?
>
> Actually, my ideas have changed somewhat, so I'll start a new thread with
the most recent incarnation.

>  > Another area which I'd like to see supported is stack walking - that is,
> > starting from the current call frame, iterate through all of the call
> frames
> > above it. Currently the only way to do this is via platform-specific
> > assembly language - I'd think it would be relatively simple to make an
> > intrinsic that does this. (Note that the current stack intrinsics are
> > unusable, because they are (a) unreliable, and (b) inefficient. Taking an
> > integer index of a stack frame as a parameter is not the right way to do
> > it.)
>
> Do you have something in mind similar to the interface provided by
> libunwind (http://www.nongnu.org/libunwind/docs.html) but lighter weight
> and integrated with LLVM?
>
> You're in luck - I happen to have a document that explains this very thing:

https://docs.google.com/document/pub?id=1-ws0KYo47S0CgqpwkjfWDBJ8wFhW_0UYKxPIJ0TyKrQ

> I have to say, if there's any single issue that could make me give up on
> > LLVM and switch over to using something like a JVM, it would be this -
> > because as far as I can tell, LLVM's garbage collection features are in
> > exactly the same state they were in when I started working with LLVM four
> > years ago.
>
> In my mind one of the frustrations trying to use the GC support in LLVM
> is the somewhat sketchy documentation. I'd be nice to have a
> straightforward end-to-end example. Since you've implemented a GC using
> this support maybe you could write something up. I know I'd appreciate
> it a lot. Do you find that the status table in the GC document is still
> accurate? It seems like the features you'd like to see implemented most
> here are
>
>  - Emitting code at safe points
>  - Register maps
>  - Liveness analysis (it appears that the new lifetime intrinsics may
>   enable this)
>
> > *Platform ABI limitations* - Currently LLVM requires the frontend
> developer
> > to know quite a lot about the platform ABI - for example whether you are
> > allowed to pass a struct of a certain size as a value type rather than as
> a
> > reference. The thing is, experimental languages like mine generally don't
> > care so much about ABI compatibility - sure, we'd like to be able to call
> C
> > library functions once in a while (and we don't mind doing the extra work
> in
> > those cases), but most of the time we just want to pass a data type
> around
> > and expect it to work. Requiring the use of different techniques on
> > different platforms makes the situation considerably more complex.
>
> Is it reasonable to add a new calling convention that like fastcc
> doesn't need to worry about conforming to any ABI but instead of trying
> to go fast, provides maximum flexibility?
>
> I'll have to defer to the experts on this one.

> > *Light-weight coroutines* would be a "nice to have", as would better
> > *concurrency
> > primitives*. These are things I could do on my own, but it would be
> better,
> > I think, to have them in LLVM - because in my view of the world, anything
> > that requires lots of architecture-specific knowledge ideally belongs on
> the
> > LLVM side of the line.
> >
> > There's been a lot of discussion about divide-by-zero errors and other
> > *non-declared
> > exceptions*. Having this available would be a great help.
>
> Agreed. I have read some of the proposals about this and it does indeed
> seem tricky. At the least the instructions that could raise an exception
> would have to be marked in some way so that side-affecting code isn't
> moved past them.
>
> There have been a couple of proposals put forward, but nothing has
materialized yet. Here is one:

http://code.google.com/p/llvm-stack-switch/wiki/Proposal

There's also this thread:

http://comments.gmane.org/gmane.comp.compilers.llvm.devel/38987

>  > *Named structure types* - as per Chris's proposal - would be a major
> > simplification, as it would allow declaration of pointer fields without
> > having to import the code that defines the structure of the thing that
> the
> > pointer is pointing to.
>
> I would really like to see this. It would also help when inspecting IR.
> Right now if types have been uniqued all structures with the same layout
> get collapsed to a single name which makes things awkward to read.
>
> Yep!

> -- Geoff
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110706/88114ee9/attachment.html>