[LLVMdev] LLVM and managed languages

Mon Dec 5 16:18:52 PST 2011

Comments inline.

On Sun, Dec 4, 2011 at 1:38 PM, Jon Harrop <
jonathandeanharrop at googlemail.com> wrote:

> Hi Talin,
>
> I wrote HLVM:
>
>  http://www.ffconsultancy.com/ocaml/hlvm/
>
> The whole project took a few man months. HLVM provides an ML-like type
> system, unboxed tuples, discriminated unions, TCO, generic printing, C FFI,
> POSIX threads, mark-sweep GC and both JIT and standalone compilation. I
> wrote several (accurate) garbage collectors for it including a
> stop-the-world mark-sweep collector and they took a few days each. I did so
> by implementing my own shadow stack along the lines of Henderson's
> "Accurate garbage collection in an uncooperative environment". HLVM was
> only ever a hobby project. IMHO, the results were incredible and are a
> testament to the awesomeness of LLVM.
>
> HLVM prototyped many interesting ideas but the most relevant here is the
> use of value types to avoid heap allocation in order to alleviate the
> garbage collector. For example, HLVM is about 3x faster than Java on the
> following ray tracing benchmark because HLVM does no allocation or
> collection whilst tracing whereas Java does a huge amount:
>
>
> http://flyingfrogblog.blogspot.com/2010/01/hlvm-on-ray-tracer-language-comparison.html
>
> I believe that benchmark shows that even the most heavily optimized GC can
> be thrashed by a program that simply avoids GC. Value types make this
> possible and the lack of value types on the JVM is, therefore, a major
> issue.
>
> I've looked over the HLVM code a bit in the past. Note that I do have
value types in my language, although they are not ubiquitous.

> Now, I have some concerns about the statements being made here regarding
> GC support in LLVM. Let me begin by addressing the statement:
>
> > Garbage collection is still way too difficult.
>
> This is completely untrue. You can easily write your own GC simply by
> storing roots explicitly in your own shadow stack. Contrary to what many
> people told me before I did this, performance is fine for many applications:
>
>
> http://flyingfrogblog.blogspot.com/2009/03/current-shadow-stack-overheads-in-hlvm.html
>
> I'm afraid I'm going to have to disagree.

First off, since you mentioned shadow-stack: All that shadow stack does is
tell you at what offsets, for each stack frame, the stack root pointers
are. Now, my garbage collector doesn't use shadow-stack, it uses the *real*
stack - that is, instead of maintaining a parallel linked list of stack
frames which has to be updated on function entry and exit, it uses the
actual call frame pointers to traverse the list of call frames. This avoids
the need to update a global variable (even a thread-local global) each time
a call frame is entered and exited, which is what shadow-stack requires.

The actual code to do this is fairly trivial - much less code involved than
shadow stack, but there is one minor hitch - you have to re-implement it
for each different processor, since different processors store the call
frame pointer in a different register. I think it would be nice if LLVM
supplied a standard way to do this, but it's not something that is a
serious obstacle to me.

But all of that is unimportant - my stack traversal is semantically
equivalent to shadow-stack. None of the rest of what I'm about to say would
make any difference if I was using shadow stack. That's not where the
problems lie.

The main problem, as I see it, is the design of the llvm.gcroot()
intrinsic. The first problem with it is that the argument to llvm.gcroot
must be a memory address - specifically, it must be the result of an alloca
instruction. That means that even if you have nice value types, they have
to live in memory in order for them to be traced. You can't keep a value
type in a register or an SSA value. If you do have some data type in a
register, and it contains pointers that need to be traced, that value must
be written to memory at each call site, and reloaded from memory after the
call returns. (The latter is only true if you have a moving collector,
which I do.)

A second problem is that you are required to explicitly declare as stack
root not only your local variables, but any intermediate results of
expression. In other words, the "root-ness" of a value doesn't
automatically follow along with the value, as it would with a type-based
approach. The result is that the frontend ends up generating horrible,
horrible code that is littered with extra allocas, calls to llvm.gcroot(),
and lots of extra loads and stores.

A third problem is that the optimizer *has* to know about garbage
collection, because the optimizer can do things that might invalidate your
garbage collector's assumptions. For example, the optimizer might decide to
slice a struct in half, placing each half in a register - at which point,
your table of offsets showing where the traceable pointers are is now
meaningless. From what people have told me, LLVM currently handles this by
simply disabling most optimizations for any value which is marked with
llvm.gcroot(). If "root-ness" were a property of a type, then the optimizer
would be able to do whatever it wanted (mostly I guess), and then you'd be
able to scan the output of the optimizer and identify roots by their types.
Stack roots would only need to be traced if they were in a range of code
where they were live. And so on.

I believe the conclusion you meant to draw is that writing a GC using
> LLVM's highly experimental GC support is way too difficult but that is a
> very different statement.
>
> I don't know what you mean by "highly experimental GC support". As far as
I know, LLVM's GC support consists of the combination of shadow-stack + the
llvm.gc intrinsic functions. And as far as I know, there have been no
changes to either of these in over 4 years. Which is to say, it's clearly
not an area which is the focus of anyone's attention.

> This is a really important point because it means that all of the major
> changes discussed so far are just optimizations beyond a simple shadow
> stack. Moreover, I don't think we have any idea how valuable these
> optimizations might be. Therefore, I think this discussion needs to start
> with an objective survey of the possible approaches and some performance
> estimates.
>
> Increasing performance is only half of the story. The other half is making
it easy to use. For me personally, I've managed to work around all of the
issues that I've mentioned, but it took me a long time to get it right, and
even today it's an ongoing burden - the IR code that my frontend generates
is unreadable due to it being cluttered with extra instructions needed to
deal with GC roots, which would not be needed if LLVM had a better means of
indicating roots.

In other words, I started this thread not because I needed the things I was
asking for - I've already worked around the lack of them - but rather
because I want those who come after to me to have a better experience than
I had. My premise at the start of the thread is that more people choose the
JVM over LLVM as a host platform for language experimentation, and I still
think this is true.

> HLVM's benchmarks indicate that the shadow stack overhead is usually under
> 20% but can be as high as 70% with purely functional code. HLVM currently
> pushes all roots onto the shadow stack as soon as they come into scope and
> resets the shadow stack pointer at each function exit. A better approach
> would be to defer pushing until safe points (if any) and push only live
> roots (i.e. those referred to again after the safe point).
>
> I believe HLVM could attain competitive performance even on purely
> functional data structures by using a non-moving mark-region collector. I
> have written several prototypes in C++ but they are very much
> work-in-progress. As the GCs are non-moving there is no need to restore
> pointers after a safe point, which sounds more efficient to me. The
> downside is that temporaries are no longer allocated by bumping a pointer
> into the nursery generation but my mark-region GC uses a small bitwise LUT
> to find the next free block in the current region which is almost as fast.
>
> Others have said that the best possible performance is obtained by making
> the back-end aware of GC pointers. I can well believe that but the cost of
> doing so with LLVM is so large that I think it will be necessary to
> sacrifice optimality for something almost as good that is attainable with a
> lot less effort. In fact, I would much rather see a separate project that
> sits on top of LLVM and provides a safe high-level garbage collected VM for
> language experimenters to use.
>
> I can't say how much difference in performance there would be if LLVM had
a GC-aware optimizer, as opposed to what we have now, which is that
GC-related code is mostly unoptimized (again, from what people have told
me, not from my own observation).

> Incidentally, if anyone wants to get up to speed on GC design I highly
> recommend Richard Jones' new book on the subject:
>
>
> http://www.amazon.co.uk/Garbage-Collection-Handbook-Management-Algorithms/dp/1420082795
>
> Cheers,
> Jon.
>
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
> Behalf Of Talin
> Sent: 01 July 2011 19:06
> To: LLVM Developers Mailing List
> Subject: [LLVMdev] LLVM and managed languages
>
> So I've been using LLVM for about 4 years now, and I've posted a lot on
> this list about specific issues. What I would like to do is step back for a
> moment and give my "big picture" assessment of LLVM overall, particularly
> with respect to developing a "managed" language like Java / C# or my own
> language, Tart.
>
> Obviously, I feel that LLVM is the best choice out there, otherwise I
> would of switched to using other alternatives long ago. At the same time,
> however, I've observed that the great majority of experimental languages -
> Groovy, Scala, and many others - are built on top of the Java virtual
> machine, which seems to be the platform of choice for language
> experimenters. In a similar vein, if you spend any time studying the
> academic research on garbage collection algorithms, you'll see that the JVM
> also appears to be the platform of choice here as well.
>
> One question to ask is, what factors would weigh heavily in the decision
> to choose the JVM over LLVM when implementing a managed language? On the
> positive side, LLVM's mature and simple APIs make it a breeze to manipulate
> types and functions, and it's modular architecture means I can use the
> parts I want without being forced to use parts I don't need. And the
> ability to go straight to machine code means that I don't need to ship a
> complex virtual machine with my executables.
>
> I would also like to list the areas where I don't need help from LLVM -
> that is, these are things I can easily handle myself in the frontend:
> • Dynamic dispatch and virtual methods / multi-methods
> • Boxing and unboxing of value types
> • Reflection
> • Memory management
> I mention these items for two reasons: First, because these are services
> that would be provided by a JVM, and I want people to know that the lack of
> these items in LLVM does not in any way count as a detriment. And secondly,
> because I've occasionally heard people on this list ask for features like
> these, and I think it would be a waste of time to spend effort implementing
> something that the frontend can easily handle.
>
> Next, I want to list the areas where I think LLVM is lacking:
>
> The first issue is more of a community-related one, which is that IMHO
> LLVM doesn't have a lot of forward momentum in the experimental languages
> space. The vast majority of contributions to the LLVM code base fall into
> three categories: Clang-related work, ports to new backends, and
> optimization research. Parts of LLVM which don't fall into these categories
> - like garbage collection for example - tend to languish.
>
> Note that this is not a criticism, but merely an observation - if it were
> a criticism I'd have to blame myself as much as anyone else. There are good
> reasons why things are as they are - the few people like myself who are
> working on non-C-like languages generally don't have the energy to both
> work on their front ends and at the same time do any serious amount of work
> on the LLVM infrastructure. (You all saw what happened with the union types
> fiasco - I contributed all of the front-end parts for declaring and
> serializing unions, in the hopes that someone more knowledgeable would pick
> up the effort of implementing them on the code generation side - but that
> never happened and the feature got removed after six months of no progress.)
>
> I think that LLVM's best hope in this department is that one of these
> experimental languages becomes popular enough to attract a serious
> contributor base of it's own, and that some of that effort can bleed off
> into improving LLVM. Some of that was actually starting to happen in the
> case of unladen-swallow, before that project was shut down.
>
> The rest of my issues are more specific:
>
> Garbage collection is still way too difficult. The biggest problem is the
> inability to track SSA values - it requires the frontend to generate very
> inefficient and error-prone code, manually spilling SSA values to the stack
> around nearly every function call. I've written proposals for improving the
> situation, which I won't repeat here - but realistically there's no way
> that I am ever going to have time learn to enough about LLVM's code
> generation passes to implement something like this myself.
>
> Another area which I'd like to see supported is stack walking - that is,
> starting from the current call frame, iterate through all of the call
> frames above it. Currently the only way to do this is via platform-specific
> assembly language - I'd think it would be relatively simple to make an
> intrinsic that does this. (Note that the current stack intrinsics are
> unusable, because they are (a) unreliable, and (b) inefficient. Taking an
> integer index of a stack frame as a parameter is not the right way to do
> it.)
>
>
> I have to say, if there's any single issue that could make me give up on
> LLVM and switch over to using something like a JVM, it would be this -
> because as far as I can tell, LLVM's garbage collection features are in
> exactly the same state they were in when I started working with LLVM four
> years ago.
>
> Generating debug info is another area which is very hard to use. The new
> DIBuilder class was a significant improvement, but the whole area is still
> very complex, it's extremely easy to make mistakes and then spend days or
> weeks trying to figure out what went wrong - there's no type safety or
> validation that can prevent you from making such mistakes. Over the last 4
> years I've spent many man-months of time wrestling with DWARF.
>
> (One also wonders if DWARF is even the right answer. Especially compared
> to the JVM - the Java debuggers are able to derive much of their
> information from the reflection and class metadata that's already there in
> the class files, so from that perspective DWARF adds a huge amount of
> overhead.)
>
> Platform ABI limitations - Currently LLVM requires the frontend developer
> to know quite a lot about the platform ABI - for example whether you are
> allowed to pass a struct of a certain size as a value type rather than as a
> reference. The thing is, experimental languages like mine generally don't
> care so much about ABI compatibility - sure, we'd like to be able to call C
> library functions once in a while (and we don't mind doing the extra work
> in those cases), but most of the time we just want to pass a data type
> around and expect it to work. Requiring the use of different techniques on
> different platforms makes the situation considerably more complex.
>
> Light-weight coroutines would be a "nice to have", as would better
> concurrency primitives. These are things I could do on my own, but it would
> be better, I think, to have them in LLVM - because in my view of the world,
> anything that requires lots of architecture-specific knowledge ideally
> belongs on the LLVM side of the line.
>
> There's been a lot of discussion about divide-by-zero errors and other
> non-declared exceptions. Having this available would be a great help.
>
> Named structure types - as per Chris's proposal - would be a major
> simplification, as it would allow declaration of pointer fields without
> having to import the code that defines the structure of the thing that the
> pointer is pointing to.
>
> Anyway that's pretty much my list of big-ticket items. As you can see,
> anyone who cares about these issues would have to think very hard before
> choosing LLVM over the JVM as an implementation platform.
>
> --
> -- Talin
>
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111205/1566ad37/attachment.html>