[LLVMdev] Garbage collection

Thu Feb 26 00:02:00 PST 2009

One of the more interesting subjects of conversation at the 2008 
developer day was related to garbage collection. With the increasing 
number of LLVM-based VMs and other projects, I suspect that the desire 
for more comprehensive garbage collection support in LLVM is only going 
to increase. (I am now involved in two different open-source projects 
both of which will eventually have a strong requirement for a 
high-performance LLVM-compatible collector.)

The last time I looked, the LLVM support for GC is fairly austere, 
consisting of (a) the bare minimum set of low-level intrinsics needed to 
integrate a collector with a runtime environment, and (b) a couple of 
example of collectors, neither of which AFAIK take full advantage of the 
low-level intrinsics to get optimal performance.

The IR-level intrinsics themselves don't much help you *write* a GC, so 
much as to integrate one with LLVM. What is provided is essentially a 
mechanism for walking the stack, and a means to insert read/write 
barriers into the generated code, which together form a tiny fraction of 
what it would take to design a working collector.

Part of the reason why there isn't more direct support for GC is the 
theory that there is no such thing as a one-size-fits-all collector. The 
argument goes that a really efficient collector design requires detailed 
knowledge of the object model of the language being compiled. Some 
languages use explicit type information embedded in the object, which 
has a specific offset and a specific format that is likely to be 
different for different languages. Other languages have the benefit of 
compile-time knowledge of the type of an entire object graph, and thus 
do not require the overhead of a per-object type field.

On the other hand, it is possible to make a counter-argument to this 
theory that goes like this: The Java VM has been used to implement a 
large number of front-end languages efficiently, without requiring a 
special garbage collector for each language. Furthermore, I've observed 
that the Java VM, more than any other runtime environment, tends to be 
used as a platform for academic research into GC algorithms, without 
requiring changes to the set of languages which are hosted by the VM. In 
other words, both the language and the GC algorithm can be varied 
independently, from which we can conclude that perhaps the choice of 
language and GC aren't as tightly coupled as we might think.

It might also be argued that many of the languages that people are 
wanting to build with LLVM have Java-like object semantics anyway, at 
least at the level of GC. (Although finalization is an issue, since this 
is an area in which language semantics vary widely.)

It also seems to me that even radically different collector designs 
could utilize some common building blocks for heap management, work 
queuing, and so on. It might make sense to create a library of such 
components as an auxilliary project to LLVM. For example, while there 
are a large number of collector algorithms, there are a fairly small 
number of designs for write barriers - so the library could provide 
several implementations of such. Similarly, there are only so many ways 
to "stop the world" (for concurrent collectors that require this), and 
this could be provided as well.

Of course, there is always a danger when creating libraries of the 
"ivory tower syndrome", putting a lot of effort into components that 
don't actually get used. This is why it would be even better to create a 
standard, high performance collector for LLVM that actually uses these 
methods.

As I see it, such a collector would include two tracing strategies, one 
based on per-object type information, and one based on static type 
knowledge. Having both types of tracing in a single collector would not, 
I think, cause a serious performance degradation, especially as object 
models such as Java require both types anyway.

Specific object model details such as the offset of the type info within 
an object can be dealt with by implementing the tracing functions in 
LLVM IR, so that all of the various inefficiencies caused by 
parameterizing the object layout get optimized away. It should be 
relatively easy for a frontend to generate the IR for a tracing function 
for each concrete class, while the collector library supplies the code 
that is called by the tracer for each pointer location in memory. 
Tracing functions aren't that complicated anyway, it's the interaction 
between the tracer and the mutator is where things get complex, and that 
is largely independent of things like object memory layouts.

In other words, I think that it should be possible to create a fairly 
comprehensive and efficient collector implementation in which 80% of the 
implementation lives in a library, and the language front end generates 
the remaining 20%. Moreover, it should be possible to design this 
library in a way that makes it possible for people to replace 
significant parts of the collector, so that it can be used as a basis 
for developing a diverse selection of collection strategies. And because 
all of the parts that are specific to the object model are generated by 
the front-end, the collector should work for a large number of languages 
with different object semantics.

The hardest part will be issues involving concurrency and threading. It 
has been my observation that support for concurrency is the one thing 
that you can't build up incrementally - that is, with most aspects of a 
GC you can start with a simple implementation and gradually add 
improvements (multiple generations and so on), but once you add 
concurrency you pretty much have to start over. Unfortunately, I am not 
a huge concurrent programming whiz, which is part of the reason why I 
haven't simply gone and implemented the collector myself. However, this 
does suggest to me that if we want to support concurrency (which I 
believe we will) then it has to be supported from the very beginning.

As I might have mentioned before on this list, I have written a few 
modest components that I plan to use as the basis for a collector which 
I plan to name 'Scarcity'. One of these is a heap manager which replaces 
malloc/free and which has fairly efficient functions for things like 
"free all blocks for which this callback function returns true" which 
can be used to implement mark and sweep.

-- Talin