[LLVMdev] Garbage Collection Project

Talin viridia at gmail.com
Thu Jun 18 23:15:20 PDT 2009

Jon Harrop wrote:
> On Tuesday 16 June 2009 07:37:32 Talin wrote:
>> A while back there was a discussion thread about whether an accurate,
>> concurrent garbage collector could be "generic" in the sense of being
>> able to support multiple different languages efficiently. After having
>> done some work on this, I now believe that this is the case - using C++
>> policy-based design principles, you can create a set of modules that
>> represent different aspects of collector behavior (such as
>> mark-and-sweep, stop-the-world, and so on) along with different aspects
>> of the runtime environment (object tracing strategies, heap structures,
>> threading primitives, atomics), and encode these various behaviors as
>> template classes which can be bound together to create an efficient
>> collector.
> Hi Talin,
> This is great news! I have some questions...
And thank you for your interest! :)
> I had great success using some seemingly-unusual techniques in my experiments.
> Firstly, rather than using a single 1 word pointer to represent a reference I 
> chose to use 3 words including a pointer to the type and a pointer to the 
> value (as well as metadata). This allows typed nulls and that addresses an 
> important deficiency found in most other VMs including the CLR. Is Scarcity 
> able to handle such references or does its implementation of stack frames 
> require references to be a single word?
As you no doubt know, there are a couple of different approaches to 
obtaining the type information for an object reference. These can be 
divided into static typing, where the compiler knows in advance the type 
of the reference, and object tagging, in which the type information is 
stored with the object being referred to (usually in the object itself, 
although your case its with the reference.)

Scarcity attempts to unify these two models by declaring that object 
tagging is a special case of static typing. For every object reference 
the compiler is required to supply a tracing strategy for that 
reference, in the form of a function that knows how to trace objects of 
that static type. That tracing strategy may know the entirety of the 
object's structure at compile time, in which case object tags are not 
needed; Or if the reference is polymorphic, then the tracing strategy 
function can inspect the object's tag in order to look up the metadata 
that describes where the references within the object are located. This 
is up to the language implementer to decide. In fact, you could 
conceivably have more than one object hierarchy each with it's own tag 
format, and so long as the compiler can determine statically which tag 
format is being used for a particular reference, it should all work.

The strategy function enumerates all of the references in the object and 
passes them to the trace visitor, which is a functor object supplied by 
the collection algorithm. The strategy function passes into the visitor 
a pointer to each reference, rather than the reference itself - the 
reason for this is to support copying collectors which can modify the 
reference to point to the new location.

This means that for your purposes, Scarcity does not dictate the format 
of object metadata nor the way objects are traced. However, some of the 
collector algorithms will assume that the pointer-to-reference is 
pointing to an object pointer. So you wouldn't be able to use those 
collector algorithms.

In other words, you probably won't be able to use the Scarcity 
collection algorithms for what you want, but what you can do is build a 
collector within the Scarcity framework that does what you want. Whether 
the various modules within Scarcity are valuable enough to be used in 
this way is up to you.
> Secondly, I used LLVM to JIT compile per-type code for garbage collection in 
> order to traverse data structures as efficiently as possible. Moreover, I 
> chose to write the entire GC in an unsafe subset of the language that I was 
> implementing so that I could rely upon tail calls and so forth. Does Scarcity 
> require the GC code to be written entirely in C?
I imagine that most languages will want to write as much as possible in 
the hosted language rather than in C. The question is where to draw the 
dividing line. There are several issues which must be considered:

1) Small snippets of code such as write barriers should be inlined 
wherever possible. This pretty much requires that they be written in the 
same language as the code that they will be embedded within.

2) Many high-level languages really aren't very good at dealing with raw 
memory, because it requires you to introduce pointer arithmetic and 
other "unsafe" operations as you point out. For some languages, adding 
the necessary unsafe operations can distort and corrupt the language 
design to the point where it feels like you are writing a second 
compiler. If such is the case, then it makes sense to offload some of 
these operations to a lower-level language such as C.

3) In a hybrid approach where you are doing high-level stuff in your 
language, and then grunging around with pointers in C, there's some 
efficiency loss due to the fact that the compiler for each language 
cannot inline or optimize code written in the other language. (Although 
if you compile the C part with clang and use link-time optimization, you 
can mitigate that somewhat.) One has to think very carefully how to 
structure the API to avoid frequent jumping back and forth across the 
language barrier.

Because Scarcity is designed as a framework of modules, you can decide 
where to draw that line by deciding which modules that you are going to 
use. For example, you might choose not to use any of Scarcity's 
collection algorithms at all, but you still might use it's 
stop-the-world thread manager, or its memory heap.
> Finally, do you have any references describing the techniques you have used to 
> implement a concurrent GC? I have been reading up on the subject for several 
> years now, with a view to implementing my own concurrent GC.
There's a wiki page on the scarcity site that lists references to a 
whole bunch of papers on garbage collection techniques.


-- Talin

More information about the llvm-dev mailing list