[LLVMdev] Garbage Collection Roots

Gordon Henriksen gordonhenriksen at mac.com
Tue Sep 4 00:44:48 PDT 2007


On 2007-09-03, at 23:14, Talin wrote:

> On Sep 2, 2007 5:31 AM, Gordon Henriksen <gordonhenriksen at mac.com>  
> wrote:
>
>> On Sep 2, 2007, at 04:54, Talin wrote:
>>
>>> I've been looking through the documentation (http://llvm.org/docs/ 
>>> GarbageCollection.html) on how to implement a garbage collector  
>>> for LLVM and there's a couple of things that I don't quite  
>>> understand. Specifically, it says that when a stack variable goes  
>>> out of scope, you're supposed to assign a null value to it to  
>>> indicate that the value is no longer live.
>>>
>>> What I don't quite understand is how the GC can efficiently use  
>>> this information.
>>
>> The GC intrinsics exist for the type of collector which depends on  
>> metadata compiled alongside the function. At runtime, such a  
>> collector walks the dynamic call stack and merges that with the  
>> static metadata to identify dynamically live roots. Like "zero  
>> cost exception handling" or debug info, collecting this kind of  
>> metadata requires backend code generator support, which again is  
>> the reason these intrinsics exist.
>
> So, if I understand what you are saying correctly, the  
> "llvm.gcroot" intrinsic isn't actually an "instruction" in the  
> sense I normally think of as an action to be taken; It's more of a  
> marker which is used by the code generator to create a data  
> structure that describes the stack layout. Is that correct?

That's correct; the gcroot intrinsic is an annotation of an alloca.  
This is not without precedent. The llvm.noalias intrinsic is an  
annotation of an SSA pointer value.

> If that's the case, then my next question is: How do the LLVM  
> garbage collection intrinsics work in a multi-threaded environment?  
> If there's a data structure that describes the layout of the stack,  
> and you have multiple stacks, what happens?

The intrinsics are entirely neutral to collector implementation, and  
thus to threading. They could easily be used to implement reference  
counting, for instance, which may or may not be implemented in a  
threadsafe manner. However, as with your algorithm, reference  
counting does not require code generator support, and so would not  
justify the introduction of the intrinsics.

But to elaborate upon the described class of collectors: The emitted  
GC tables do not describe the dynamic call stack; rather, they are  
static data compiled into the executable. Only when cross-referenced  
with the dynamic call stack do they identify the roots. The garbage  
collectors for .NET and Java work in this manner.

Here's an example of the sort of data these tables might contain. (I  
generated this using the -print-gc feature which I have added to llc  
locally; the llc in Subversion does cannot print this data because it  
does not collect it.) This test case has a simple function with 2  
roots and 1 call.

GC roots for fun:
         0       8[sp]
         1       4[sp]
GC safe points for fun:
         label 1: post-call, live = { 0, 1 }

Provided this information, the collector can easily identify the live  
roots within fun's stack frame at runtime—so long as fun is paused at  
the safe point, which it must be if it is not the topmost stack frame.

Walking the stack; identifying the current safe point and stack  
pointer; stopping concurrent threads at safe points; and actually  
tracing the heap—all left as exercises to the interested reader. :)

Here's a resource you might find interesting for further reading:

     http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbib.html

> BTW, I'm also noticing that the current SemiSpace example code  
> seems quite incomplete.

I observed that myself.

— Gordon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20070904/192681cb/attachment.html>


More information about the llvm-dev mailing list