[LLVMdev] Garbage collection

Fri Feb 27 21:29:55 PST 2009

Chris Lattner wrote:
> On Feb 26, 2009, at 12:02 AM, Talin wrote:
>   
>>  With the increasing
>> number of LLVM-based VMs and other projects, I suspect that the desire
>> for more comprehensive garbage collection support in LLVM is only  
>> going
>> to increase.
>>     
>
> Absolutely!
>
>   
>> Part of the reason why there isn't more direct support for GC is the
>> theory that there is no such thing as a one-size-fits-all collector.  
>> The
>> argument goes that a really efficient collector design requires  
>> detailed
>> knowledge of the object model of the language being compiled.
>>     
>
> Yes, you do need to have some knowledge about the object model.   
> However, it would be perfectly reasonable for LLVM to support and  
> include multiple different collectors for different classes of language.
>
>   
>> On the other hand, it is possible to make a counter-argument to this
>> theory that goes like this: The Java VM has been used to implement a
>> large number of front-end languages efficiently, without requiring a
>> special garbage collector for each language.
>>     
>
> Most importantly to me, the takeaway from Java is that just having  
> something that works "well enough" is really important and helps  
> bootstrap a lot of projects, which can then take the "last 10% of  
> performance" as an optimization opportunity, instead of being blocked  
> from even starting with LLVM.
>
> I'd claim that JavaVM really isn't a good way to implement a lisp vm  
> or something like that.  However, the perf delta induced by the Java  
> VM may just *not matter* in the big picture.  At least with LLVM, a  
> Lisp implementation could be brought up on an "OOP GC" and switched to  
> something more custom as the project develops.
>
>   
>> It also seems to me that even radically different collector designs
>> could utilize some common building blocks for heap management, work
>> queuing, and so on.
>>     
>
> Yes.
>
>   
>> Of course, there is always a danger when creating libraries of the
>> "ivory tower syndrome", putting a lot of effort into components that
>> don't actually get used. This is why it would be even better to  
>> create a
>> standard, high performance collector for LLVM that actually uses these
>> methods.  <many good thoughts trimmed>
>>     
>
> What you see in LLVM right now is really only the second step of the  
> planned GC evolution.  The first step was very minimal, but useful for  
> bridging to other existing collectors.  The second step was Gordon's  
> (significant!) extensions to the system which allowed him to tie in  
> the Ocaml collector and bring some more sanity to codegen.
>
> While people object to adding high level features to LLVM, high level  
> and language-specific features are *great* in llvm as long as they are  
> cleanly separable.  I would *love* to see a composable collection of  
> different GC subroutines with clean interfaces built on LLVM "assembly  
> language" GC stuff.
>
> In my ideal world, this would be:
>
> 1. Subsystems [with clean interfaces] for thread management,  
> finalization, object model interactions, etc.
> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)  
> there can be replaceable policy components etc.
> 3. A couple of actual GC implementations built on top of #1/2.   
> Ideally there would only be a couple of high-level collectors that can  
> be parameterized by replacing subsystems and policies.
> 4. A very simple language implementation that uses the facilities, on  
> the order of complexity as the kaleidoscope tutorial.
>
> As far as I know, there is nothing that prevents this from happening  
> today, we just need leadership in the area to drive it.  To avoid the  
> "ivory tower" problem, I'd strongly recommend starting with a simple  
> GC and language and get the whole thing working top to bottom.  From  
> there, the various pieces can be generalized out etc.  This ensures  
> that there is always a *problem being solved* and something that works  
> and is testable.
>
> One of the annoying reasons that the GC stuff is only halfway fleshed  
> out is that I was working on an out of tree project (which of course  
> got forgotten about when I left) when developing the GC intrinsics, so  
> there is no fully working example in public.
>
> -Chris
>
> ps. Code generation for the GC intrinsics can be improved  
> significantly.  We can add new intrinsics that don't pin things to the  
> stack, update optimizations, and do many other things if people  
> started using the GC stuff seriously.
>   
So I guess what I would be helpful for me is a roadmap that defines more 
clearly (a) what parts you plan to build in LLVM (beyond what is already 
there), (b) what parts you would like to have contributed, and (c) what 
parts you definitely want to keep external. In particular, I'd like to 
get a clearer picture of the shapes of the various pieces and their roles.

For example, I mentioned the "stop the world" function - however since 
LLVM defines no primitives for creating threads or synchronizing between 
them, its hard to see how this could be part of LLVM proper. On the 
other hand, a sibling project (like vmkit or clang) could probably make 
a more restrictive set of assumptions, such as the existence of either 
POSIX or Windows threading models or some analog of those being 
available. "Stop the world" is implementable in terms of those threading 
primitives if you assume that mutator threads use sync points.

Thus, it seems to me that the proper home for such a function would be 
in a sibling project outside of LLVM proper. At the same time, however, 
core LLVM could benefit from having an implementation of this, in that 
it could guide the design of the IR by providing more concrete use cases 
for things like inserting sync points in generated code and such.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>