[LLVMdev] Garbage collection

Fri Feb 27 09:10:19 PST 2009

Jon Harrop wrote:
> On Thursday 26 February 2009 17:25:56 Chris Lattner wrote:
>   
>> In my ideal world, this would be:
>>
>> 1. Subsystems [with clean interfaces] for thread management,
>> finalization, object model interactions, etc.
>> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)
>> there can be replaceable policy components etc.
>> 3. A couple of actual GC implementations built on top of #1/2.
>> Ideally there would only be a couple of high-level collectors that can
>> be parameterized by replacing subsystems and policies.
>> 4. A very simple language implementation that uses the facilities, on
>> the order of complexity as the kaleidoscope tutorial.
>>
>> As far as I know, there is nothing that prevents this from happening
>> today, we just need leadership in the area to drive it.  To avoid the
>> "ivory tower" problem, I'd strongly recommend starting with a simple
>> GC and language and get the whole thing working top to bottom. From 
>> there, the various pieces can be generalized out etc.  This ensures
>> that there is always a *problem being solved* and something that works
>> and is testable.
>>     
>
> I fear that the IR generator and GC are too tightly coupled.
>
> For example, the IR I am generating shares pointers read from the heap even 
> across function calls. That is built on the assumption that the pointers are 
> immutable and, therefore, that the GC is non-moving. The generated code is 
> extremely efficient even though I have not even enabled LLVM's optimizations 
> yet precisely because of all this shared immutable data.
>
> If you wanted to add a copying GC to my VM you would probably replace every 
> lookup of the IR register with a lookup of the code to reload it, generating 
> a lot of redundant loads that would greatly degrade performance so you would 
> rely upon LLVM's optimization passes to clean it up again. However, I bet 
> they do not have enough information to recover all of the lost performance. 
> So there is a fundamental conflict here where a simple GC design decision has 
> a drastic effect on the IR generator.
>
> Although it is theoretically possible to parameterize the IR generator 
> sufficiently to account for all possible combinations of GC designs I suspect 
> the result would be a mess. Consequently, perhaps it would be better to 
> consider IR generation and the GC as a single entity and, instead, factor 
> them both out using a common high-level representation not dissimilar to JVM 
> or CLR bytecode in terms of functionality but much more closely related to 
> LLVM's IR?
>
>   
Most copying collector designs that I have seen rely on explicit 
coordination from the mutator, which means that object addresses won't 
change at any arbitrary time, but only during a "sync point". It's up to 
the mutator to call "sync" at fairly regular intervals (although calls 
to allocate memory for an object are implicitly sync points as well.) 
During that call, the heap may get re-arranged, but between sync points 
pointer values are stable. So you can still share pointer values most of 
the time.

-- Talin