[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Fri Feb 27 09:56:56 PST 2009

Gordon Henriksen wrote:
> Hi Mark,
> 
> I don't think anyone will dispute that it's easier to hack up a shadow  
> stack (or plug into a conservative collector) to get up and running  
> with GC. That is absolutely the route to go if portability trumps  
> performance.

Why? LLVM is all about portability AND performance.

> 
> If you review the mailing list history, I think you'll also find that  
> developers who do care about performance have been disappointed with  
> the impact of using a shadow stack, either managed with LLVM  
> intrinsics or by hand. Even the current state of LLVM GC (static stack  
> maps) is a significant performance improvement—but it absolutely does  
> require support from the code generator. Return addresses must be  
> mapped to stack maps, and only the code generator knows where return  
> addresses lie and how the stack frame is laid out.

I agree that the code-generator should provide information about 
stack-layout, and it must be possible to inform the optimisation passes 
that certain memory locations may be moved.

But information about stack layout is useful for things other than GC 
and would be useful for interactive debugging as well.

Intrinsics should be named for their function, not for their presumed usage.

> 
> The ultimate endgoal is to support schemes with still-lower execution  
> overhead. The next step for LLVM GC would be elimination of the reload  
> penalty for using GC intrinsics with a copying collector. This, again,  
> requires that the code generator perform bookkeeping for GC pointers.

Elimination of the reload penalty is impossible, unless the GC can be 
informed about traceable objects in registers.

> 
> I'm not sure where such vociferous concern on this subject arises. All  
> the extant collector plugins I'm aware of operate in conjunction with  
> the target-independent framework and require exactly zero code within  
> each target backend.

No collector plugins actually use gcread/gcwrite, since there  are no 
generational collectors for llvm (as yet).

According to the documentation
http://llvm.org/docs/GarbageCollection.html#runtime
The GC interface is "a work in progress"

The semantics of llvm.gcroot are vague:
"At compile-time, the code generator generates information to allow the 
runtime to find the pointer at GC safe points."

Vague, ill-specified interfaces are worse than none.

Fundamentally, implementers of new back-ends shouldn't have to worry 
about GC, and implementers of GC algorithms should not have to delve 
into the internals of the back-end.

Mark.