[LLVMdev] Garbage collection
Chris Lattner
clattner at apple.com
Thu Feb 26 09:25:56 PST 2009
On Feb 26, 2009, at 12:02 AM, Talin wrote:
> With the increasing
> number of LLVM-based VMs and other projects, I suspect that the desire
> for more comprehensive garbage collection support in LLVM is only
> going
> to increase.
Absolutely!
> Part of the reason why there isn't more direct support for GC is the
> theory that there is no such thing as a one-size-fits-all collector.
> The
> argument goes that a really efficient collector design requires
> detailed
> knowledge of the object model of the language being compiled.
Yes, you do need to have some knowledge about the object model.
However, it would be perfectly reasonable for LLVM to support and
include multiple different collectors for different classes of language.
> On the other hand, it is possible to make a counter-argument to this
> theory that goes like this: The Java VM has been used to implement a
> large number of front-end languages efficiently, without requiring a
> special garbage collector for each language.
Most importantly to me, the takeaway from Java is that just having
something that works "well enough" is really important and helps
bootstrap a lot of projects, which can then take the "last 10% of
performance" as an optimization opportunity, instead of being blocked
from even starting with LLVM.
I'd claim that JavaVM really isn't a good way to implement a lisp vm
or something like that. However, the perf delta induced by the Java
VM may just *not matter* in the big picture. At least with LLVM, a
Lisp implementation could be brought up on an "OOP GC" and switched to
something more custom as the project develops.
> It also seems to me that even radically different collector designs
> could utilize some common building blocks for heap management, work
> queuing, and so on.
Yes.
> Of course, there is always a danger when creating libraries of the
> "ivory tower syndrome", putting a lot of effort into components that
> don't actually get used. This is why it would be even better to
> create a
> standard, high performance collector for LLVM that actually uses these
> methods. <many good thoughts trimmed>
What you see in LLVM right now is really only the second step of the
planned GC evolution. The first step was very minimal, but useful for
bridging to other existing collectors. The second step was Gordon's
(significant!) extensions to the system which allowed him to tie in
the Ocaml collector and bring some more sanity to codegen.
While people object to adding high level features to LLVM, high level
and language-specific features are *great* in llvm as long as they are
cleanly separable. I would *love* to see a composable collection of
different GC subroutines with clean interfaces built on LLVM "assembly
language" GC stuff.
In my ideal world, this would be:
1. Subsystems [with clean interfaces] for thread management,
finalization, object model interactions, etc.
2. Within different high-level designs (e.g. copying, mark/sweep, etc)
there can be replaceable policy components etc.
3. A couple of actual GC implementations built on top of #1/2.
Ideally there would only be a couple of high-level collectors that can
be parameterized by replacing subsystems and policies.
4. A very simple language implementation that uses the facilities, on
the order of complexity as the kaleidoscope tutorial.
As far as I know, there is nothing that prevents this from happening
today, we just need leadership in the area to drive it. To avoid the
"ivory tower" problem, I'd strongly recommend starting with a simple
GC and language and get the whole thing working top to bottom. From
there, the various pieces can be generalized out etc. This ensures
that there is always a *problem being solved* and something that works
and is testable.
One of the annoying reasons that the GC stuff is only halfway fleshed
out is that I was working on an out of tree project (which of course
got forgotten about when I left) when developing the GC intrinsics, so
there is no fully working example in public.
-Chris
ps. Code generation for the GC intrinsics can be improved
significantly. We can add new intrinsics that don't pin things to the
stack, update optimizations, and do many other things if people
started using the GC stuff seriously.
More information about the llvm-dev
mailing list