[LLVMdev] Garbage collection

Thu Feb 26 09:25:56 PST 2009

On Feb 26, 2009, at 12:02 AM, Talin wrote:
>  With the increasing
> number of LLVM-based VMs and other projects, I suspect that the desire
> for more comprehensive garbage collection support in LLVM is only  
> going
> to increase.

Absolutely!

> Part of the reason why there isn't more direct support for GC is the
> theory that there is no such thing as a one-size-fits-all collector.  
> The
> argument goes that a really efficient collector design requires  
> detailed
> knowledge of the object model of the language being compiled.

Yes, you do need to have some knowledge about the object model.   
However, it would be perfectly reasonable for LLVM to support and  
include multiple different collectors for different classes of language.

> On the other hand, it is possible to make a counter-argument to this
> theory that goes like this: The Java VM has been used to implement a
> large number of front-end languages efficiently, without requiring a
> special garbage collector for each language.

Most importantly to me, the takeaway from Java is that just having  
something that works "well enough" is really important and helps  
bootstrap a lot of projects, which can then take the "last 10% of  
performance" as an optimization opportunity, instead of being blocked  
from even starting with LLVM.

I'd claim that JavaVM really isn't a good way to implement a lisp vm  
or something like that.  However, the perf delta induced by the Java  
VM may just *not matter* in the big picture.  At least with LLVM, a  
Lisp implementation could be brought up on an "OOP GC" and switched to  
something more custom as the project develops.

> It also seems to me that even radically different collector designs
> could utilize some common building blocks for heap management, work
> queuing, and so on.

Yes.

> Of course, there is always a danger when creating libraries of the
> "ivory tower syndrome", putting a lot of effort into components that
> don't actually get used. This is why it would be even better to  
> create a
> standard, high performance collector for LLVM that actually uses these
> methods.  <many good thoughts trimmed>

What you see in LLVM right now is really only the second step of the  
planned GC evolution.  The first step was very minimal, but useful for  
bridging to other existing collectors.  The second step was Gordon's  
(significant!) extensions to the system which allowed him to tie in  
the Ocaml collector and bring some more sanity to codegen.

While people object to adding high level features to LLVM, high level  
and language-specific features are *great* in llvm as long as they are  
cleanly separable.  I would *love* to see a composable collection of  
different GC subroutines with clean interfaces built on LLVM "assembly  
language" GC stuff.

In my ideal world, this would be:

1. Subsystems [with clean interfaces] for thread management,  
finalization, object model interactions, etc.
2. Within different high-level designs (e.g. copying, mark/sweep, etc)  
there can be replaceable policy components etc.
3. A couple of actual GC implementations built on top of #1/2.   
Ideally there would only be a couple of high-level collectors that can  
be parameterized by replacing subsystems and policies.
4. A very simple language implementation that uses the facilities, on  
the order of complexity as the kaleidoscope tutorial.

As far as I know, there is nothing that prevents this from happening  
today, we just need leadership in the area to drive it.  To avoid the  
"ivory tower" problem, I'd strongly recommend starting with a simple  
GC and language and get the whole thing working top to bottom.  From  
there, the various pieces can be generalized out etc.  This ensures  
that there is always a *problem being solved* and something that works  
and is testable.

One of the annoying reasons that the GC stuff is only halfway fleshed  
out is that I was working on an out of tree project (which of course  
got forgotten about when I left) when developing the GC intrinsics, so  
there is no fully working example in public.

-Chris

ps. Code generation for the GC intrinsics can be improved  
significantly.  We can add new intrinsics that don't pin things to the  
stack, update optimizations, and do many other things if people  
started using the GC stuff seriously.