[LLVMdev] Interfacing llvm with a precise, relocating GC

Mon Oct 28 11:03:35 PDT 2013

On 10/26/13 7:40 AM, Filip Pizlo wrote:
> You can implement a copying GC (what the kids these days call 
> relocating) without accurate roots.
I use "relocating" to brush over the distinction between "copying" and 
"compacting" collectors.  For the purposes of our discussions, the two 
are interchangeable though.
> Why aren't you just using the well-known Bartlett style of GC, which 
> allows for relocation even in the presence of conservative roots (or 
> accurate roots that don't allow relocation)?
You've brought this up a few times, so I want to go ahead and address 
your point in detail.

Background for others following this discussion: A Bartlett style 
garbage collector conservatively scans the stack and promotes* any 
object reachable from a possible heap pointer in place.  Objects 
directly reachable from the stack are not relocated.  Objects reachable 
from these objects are treated normally by the relocating collector and 
may be moved.   To support in place promotion, a small amount of extra 
heap metadata is generally required.

* "Promote" is used to indicate the object is conceptually moved into 
the "to space".  This is not the same promotion used when discussing 
generational collectors.

I agree that Bartlett collectors have significant advantages with 
regards to compatibility with legacy code and compilers.  A Bartlett 
collector does not require derived pointer tracking inside the 
compiler.  A Bartlett collector simplifies interfacing with code in 
other languages (i.e. C/C++) since it has far fewer assumptions about 
stack layout.

Unfortunately, Bartlett collectors also have a number of serious 
disadvantages.  To name a few:
- False Retention - Due to conservative stack scanning, objects which 
are not live may be falsely retained.  (i.e. false roots or dead roots)
- External Fragmentation - Promoted-in-place objects can be anywhere in 
the heap.  This introduces fragmentation and complicates both allocation 
and collection.
- Internal Fragmentation - Promoted-in-place objects cause the retention 
of the entire "page" containing them.  Unless using a free-list style 
allocator, all of this space is inaccessible.
- Garbage Drag - Since falsely retained data can point to other heap 
objects, an unknown fraction of the heap may be falsely kept live. This 
is particularly problematic for programs with extreme heap usage.  
Consider a program which maintains a large B-tree in memory with 
updates.  Or a program manipulating large graphs.  In either case, a 
single stray pointer can force the retention of large portions of the 
heap graph.  (In practice, this does happen.)

I do acknowledge that each of these is individually surmountable. 
However, doing so requires significant work beyond the basic design 
described in Bartlett's initial work.  Given the complexity of 
engineering a production grade garbage collector, this additional 
complexity is highly undesirable.

Taking a step back from the technical merits, I have one other 
overriding reason for not pursuing a Bartlett style approach: we already 
have a fully precise relocating collector.  We are not interested in 
abandoning this.  From our side, the only question is whether LLVM is 
suitable for use with our existing collector.

I'm happy to keep debating the merits of various approaches with you 
(this is fun!), but from a practical perspective, using an alternate 
approach to garbage collection is simply a non-starter for us.

Yours,
Philip