[LLVMdev] Garbage Collection Project
viridia at gmail.com
Thu Jun 18 23:15:20 PDT 2009
Jon Harrop wrote:
> On Tuesday 16 June 2009 07:37:32 Talin wrote:
>> A while back there was a discussion thread about whether an accurate,
>> concurrent garbage collector could be "generic" in the sense of being
>> able to support multiple different languages efficiently. After having
>> done some work on this, I now believe that this is the case - using C++
>> policy-based design principles, you can create a set of modules that
>> represent different aspects of collector behavior (such as
>> mark-and-sweep, stop-the-world, and so on) along with different aspects
>> of the runtime environment (object tracing strategies, heap structures,
>> threading primitives, atomics), and encode these various behaviors as
>> template classes which can be bound together to create an efficient
> Hi Talin,
> This is great news! I have some questions...
And thank you for your interest! :)
> I had great success using some seemingly-unusual techniques in my experiments.
> Firstly, rather than using a single 1 word pointer to represent a reference I
> chose to use 3 words including a pointer to the type and a pointer to the
> value (as well as metadata). This allows typed nulls and that addresses an
> important deficiency found in most other VMs including the CLR. Is Scarcity
> able to handle such references or does its implementation of stack frames
> require references to be a single word?
As you no doubt know, there are a couple of different approaches to
obtaining the type information for an object reference. These can be
divided into static typing, where the compiler knows in advance the type
of the reference, and object tagging, in which the type information is
stored with the object being referred to (usually in the object itself,
although your case its with the reference.)
Scarcity attempts to unify these two models by declaring that object
tagging is a special case of static typing. For every object reference
the compiler is required to supply a tracing strategy for that
reference, in the form of a function that knows how to trace objects of
that static type. That tracing strategy may know the entirety of the
object's structure at compile time, in which case object tags are not
needed; Or if the reference is polymorphic, then the tracing strategy
function can inspect the object's tag in order to look up the metadata
that describes where the references within the object are located. This
is up to the language implementer to decide. In fact, you could
conceivably have more than one object hierarchy each with it's own tag
format, and so long as the compiler can determine statically which tag
format is being used for a particular reference, it should all work.
The strategy function enumerates all of the references in the object and
passes them to the trace visitor, which is a functor object supplied by
the collection algorithm. The strategy function passes into the visitor
a pointer to each reference, rather than the reference itself - the
reason for this is to support copying collectors which can modify the
reference to point to the new location.
This means that for your purposes, Scarcity does not dictate the format
of object metadata nor the way objects are traced. However, some of the
collector algorithms will assume that the pointer-to-reference is
pointing to an object pointer. So you wouldn't be able to use those
In other words, you probably won't be able to use the Scarcity
collection algorithms for what you want, but what you can do is build a
collector within the Scarcity framework that does what you want. Whether
the various modules within Scarcity are valuable enough to be used in
this way is up to you.
> Secondly, I used LLVM to JIT compile per-type code for garbage collection in
> order to traverse data structures as efficiently as possible. Moreover, I
> chose to write the entire GC in an unsafe subset of the language that I was
> implementing so that I could rely upon tail calls and so forth. Does Scarcity
> require the GC code to be written entirely in C?
I imagine that most languages will want to write as much as possible in
the hosted language rather than in C. The question is where to draw the
dividing line. There are several issues which must be considered:
1) Small snippets of code such as write barriers should be inlined
wherever possible. This pretty much requires that they be written in the
same language as the code that they will be embedded within.
2) Many high-level languages really aren't very good at dealing with raw
memory, because it requires you to introduce pointer arithmetic and
other "unsafe" operations as you point out. For some languages, adding
the necessary unsafe operations can distort and corrupt the language
design to the point where it feels like you are writing a second
compiler. If such is the case, then it makes sense to offload some of
these operations to a lower-level language such as C.
3) In a hybrid approach where you are doing high-level stuff in your
language, and then grunging around with pointers in C, there's some
efficiency loss due to the fact that the compiler for each language
cannot inline or optimize code written in the other language. (Although
if you compile the C part with clang and use link-time optimization, you
can mitigate that somewhat.) One has to think very carefully how to
structure the API to avoid frequent jumping back and forth across the
Because Scarcity is designed as a framework of modules, you can decide
where to draw that line by deciding which modules that you are going to
use. For example, you might choose not to use any of Scarcity's
collection algorithms at all, but you still might use it's
stop-the-world thread manager, or its memory heap.
> Finally, do you have any references describing the techniques you have used to
> implement a concurrent GC? I have been reading up on the subject for several
> years now, with a view to implementing my own concurrent GC.
There's a wiki page on the scarcity site that lists references to a
whole bunch of papers on garbage collection techniques.
More information about the llvm-dev