[LLVMdev] Garbage collection

Thu Mar 5 22:44:12 PST 2009

BTW, have you look at MMTk (http://jikesrvm.org/MMTk) ? This is the 
garbage collection library that underlies JikesRVM. It is a 
'research-oriented' implementation, meaning that it has lots of 
configurable settings and plugin interfaces for implementing a broad 
range of collection algorithms. I was amused by the fact that "building 
a hybrid copying/mark-sweep collector" is one of the steps in their 
tutorial :)

Of particular interest for LLVM is MMTk's API for communicating with the VM:

   http://rvm.codehaus.org/docs/api/org/mmtk/vm/VM.html
   http://rvm.codehaus.org/docs/api/org/mmtk/vm/Memory.html
   http://rvm.codehaus.org/docs/api/org/mmtk/vm/Barriers.html

Although MMTk is far too much of a Swiss Army Knife for my purposes - 
too generalized and too complex - nevertheless some of its abstractions 
might be useful starting points for designing the kind of building 
blocks we've been talking about.  One major simplification is that we 
know that we're building on top of LLVM, so many of the tunable 
parameters in MMTk can be replaced with constants :)

One challenge is that MMTk's idioms for pluggability and extensibility 
are fairly Java-centric. I'm trying to determine what is the best way to 
do deeply invasive customization (like, swapping out the definition of a 
mutex or  changing the low-level primitive for zeroing memory) that 
would allow a similar degree of customization, but in a way that matches 
the idioms of C++. The traditional OOP style of customization involving 
virtual functions and subclassing can work for many areas, but for the 
performance critical components I would rather do the customization via 
metaprogramming or some similar technique where all of the customization 
decisions are done at compile time.

-- Talin

Gordon Henriksen wrote:
> On Feb 26, 2009, at 12:25, Chris Lattner wrote:
>
>   
>> On Feb 26, 2009, at 12:02 AM, Talin wrote:
>>
>>     
>>> With the increasing number of LLVM-based VMs and other projects, I  
>>> suspect that the desire for more comprehensive garbage collection  
>>> support in LLVM is only going to increase.
>>>       
>> What you see in LLVM right now is really only the second step of the  
>> planned GC evolution.  The first step was very minimal, but useful  
>> for bridging to other existing collectors.  The second step was  
>> Gordon's (significant!) extensions to the system which allowed him  
>> to tie in the Ocaml collector and bring some more sanity to codegen.
>>     
>
> I agree; this would be a great contribution, making LLVM much more  
> accessible to the development of novel and existing languages.
>
>   
>> While people object to adding high level features to LLVM, high  
>> level and language-specific features are *great* in llvm as long as  
>> they are cleanly separable.  I would *love* to see a composable  
>> collection of different GC subroutines with clean interfaces built  
>> on LLVM "assembly language" GC stuff.
>>     
>
> Absolutely.
>
> It is definitely valuable that the existing infrastructure doesn't  
> bolt LLVM to a particular runtime. With only a few days of work, PyPy  
> was able to try out the LLVM GC intrinsics and static stack maps and  
> saw a big performance boost from it on their LLVM back-end. (Their GCC  
> backend still outperformed LLVM, but by a much smaller margin.) But  
> this in no way prevents providing GC building blocks for projects that  
> are not working with existing runtimes and GCs.
>
>   
>> As far as I know, there is nothing that prevents this from happening  
>> today, we just need leadership in the area to drive it.  To avoid  
>> the "ivory tower" problem, I'd strongly recommend starting with a  
>> simple GC and language and get the whole thing working top to  
>> bottom.  From there, the various pieces can be generalized out etc.   
>> This ensures that there is always a *problem being solved* and  
>> something that works and is testable.
>>     
>
> I strongly agree with this as well.
>
>   
>> ps. Code generation for the GC intrinsics can be improved  
>> significantly.  We can add new intrinsics that don't pin things to  
>> the stack, update optimizations, and do many other things if people  
>> started using the GC stuff seriously.
>>     
>
>
> I've already commented on this elsewhere in the thread. Promoting GC  
> roots into SSA variables from stack slots would allow much more  
> freedom for the middle- and back-end optimizations, and I think is  
> clearly the next logical step.
>
> — Gordon
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>