[LLVMdev] Garbage Collection Roots

Gordon Henriksen gordonhenriksen at mac.com
Sun Sep 2 05:31:06 PDT 2007


Hi Talin,

On Sep 2, 2007, at 04:54, Talin wrote:

> I've been looking through the documentation (http://llvm.org/docs/ 
> GarbageCollection.html) on how to implement a garbage collector for  
> LLVM and there's a couple of things that I don't quite understand.  
> Specifically, it says that when a stack variable goes out of scope,  
> you're supposed to assign a null value to it to indicate that the  
> value is no longer live.
>
> What I don't quite understand is how the GC can efficiently use  
> this information.

The GC intrinsics exist for the type of collector which depends on  
metadata compiled alongside the function. At runtime, such a  
collector walks the dynamic call stack and merges that with the  
static metadata to identify dynamically live roots. Like "zero cost  
exception handling" or debug info, collecting this kind of metadata  
requires backend code generator support, which again is the reason  
these intrinsics exist.

> The GC will generally have some data structure that represents a  
> list of all current roots. Calling llvm.gcroot adds a pointer to  
> the list. When the variable goes out of scope, you'll want to  
> remove the pointer from the list. But merely assigning null to the  
> value doesn't cause this to happen, it only prevents the object  
> previously being pointed to from being traced. If the root pointer  
> persists in the table long enough, that same stack address may get  
> re-used for a different variable, which will be non-null, and cause  
> bad things to happen.
>
> It might be clearer if I explained what I want to do. The way that  
> I manager root pointers is as follows: For each application thread,  
> there's a pair of arrays stored in thread local data. One array is  
> for adding roots, the other is for deleting roots. In other words,  
> each time a root gets added, its address gets appended to the 'add'  
> array; Each time a root is deleted, its pointer gets appended to  
> the 'delete' array. Since both arrays are thread-local, there's no  
> locking or synchronization required. If either list gets full, a  
> collection cycle is triggered.
>
> Since the roots aren't actually used *except* during a collection  
> cycle, we don't both compacting the lists until the start of the  
> next collection. At that time, the 'add' list is compacted by  
> removing both duplicate entries and any entries found in the  
> 'delete' list. The 'add' list is then used as the root set for the  
> cycle.

Okay.

> In order for this to work, however, I definitely need a way to know  
> when a reference has gone out of scope so that it can be added to  
> the 'deleted' list.

 From what you've described, your collector does not need any code  
generator support. You could easily ignore the llvm.gc* intrinsics.  
Instead, your front-end could simply insert 'add' and 'remove'  
fragments as source variables enter and exit scope.


That said, it is possible to easily adapt LLVM's model to your  
runtime if you prefer to use the intrinsics. First, observe that the  
stack roots in a stack frame are valid until the function returns.  
Given that, you can adjust the 'add' and 'remove' arrays in a batch  
at the start and end of the function. To avoid keeping objects alive  
unnecessarily long, null the root as the variable goes out of scope,  
just as the LLVM manual suggests.

The existing LowerGC pass also finds (and removes) gcroot intrinsics  
and transforms them into LLVM code at function entry and return. You  
could easily adapt that pass to cooperate with your runtime.


Hope that helps.

— Gordon





More information about the llvm-dev mailing list