[LLVMdev] Improving Garbage Collection

Wed Jul 20 11:20:10 PDT 2011

On Jul 19, 2011, at 10:06 AM, Peter Lawrence wrote:

> Talin,
>           how about having the front-end generate an llvm.safe.point() intrinsic call at
> the desired safe points,  and having the addresses of the GC roots (at that point,
> can vary from call to call) be the parameters (with noescape attribute) to the intrinsic,
> 
> IIUC currently the GC roots are tagged, and all analysis and transform optimizations
> have to special case these tagged objects, but if instead their addresses were taken at
> the safe points no special casing would have to be done -- all analysis and transform
> optimizations already know how to deal with objects whose address is taken,
> 
> and since llvm does already have a "noescape" (not sure that's the correct name?)
> attribute for parameters, these addresses won't be misinterpreted by any alias
> analysis either, and llvm is free to go ahead and keep these values in registers
> between safe points  -- you can stop asking how to allow GC roots as SSA values,
> any traditional load-store optimization pass will do it for you for free.
> 
> *** without you having to insert explicit load and store instructions, and having to
> somehow mark them as non-delete-able, or always omit optimization passes that
> you would otherwise like to have enabled ***
> 
> and also without you having to store NULL into a safe point to end its lifetime.
> and I would suggest eliminating the gcroot() intrinsic as it's information content
> would be redundant.
> 
> 
> thoughts, comments ???
> 
> 
> -Peter Lawrence.

It is helpful to think of the stack/register map generation in two
phases: before and after identifying the location of heap
pointers. Say PrePtrMap and PostPtrMap. We don't need to know stack
offsets and register names at that point, but do we need a 1-1 mapping
from pointer values to identifiable physical locations. I deliberately
avoid calling these values roots here, because we may have multiple
live pointers derived from an object and multiple copies of the same
pointer. GC only needs to see one of these values to trace roots, but
each of these still needs its own entry in the stack/register map for
a moving collector.

PrePtrMap, we need a type system to precisely tag all values known to
contain valid heap pointers. No potentially uninitialized/undefined
values allowed here. IntPtr/PtrInt casts become control dependent.

PostPtrMap, we need an IR that represents mapped pointers as live-in
to safepoints, *and* safepoints need to be defined as clobbering all
locations that may contain a pointer. For example, if we have a
register map, then we can no longer move and add instruction across a
call if it may operate on a pointer. Obviously, it's easier to avoid
invalidating a stack map, but fundamentally the same problem. No
amount of spilling can bail you out without updating the map.

The current gcroot solution works around the lack of LLVM type system
support for heap pointers by effectively mapping pointers very early
(in the front end), reloading roots after safepoints (so we only need
one map entry per root), and relying on the rules that allows callees
to write their caller's stack under certain circumstances (someone
needs to explain these rules to me--is it only possible when an alloca
pointer is taken?).

Your proposal is exactly the same in terms of when the heap pointers
are identified. That leaves all of the LLVM optimizer and codegen
running in PostPtrMap mode. The problem is that LLVM is free to make
copies of pointers and optimize across call sites without knowing how to
update the map.

The most efficient way to support GC is to move identification of
pointer locations as late as possible. Optimization across safepoints
needs to be effectively disabled after that point.

-Andy

> 
> 
> 
> 
> On Jul 18, 2011, at 10:53 PM, Talin wrote:
> 
>> On Mon, Jul 18, 2011 at 11:00 AM, Peter Lawrence <peterl95124 at sbcglobal.net> wrote:
>> Talin,
>>           do you identify safe-points in the current or proposed llvm scheme, and if so how,
>> or are they implicit as being at all call sites (which begs the question what about leaves
>> in the call tree, how does GC get started at all in that case).
>> 
>> The LLVM linker has a feature where you can specify what kind of safe points your collector requires - the options are Loop, Return, PreCall and PostCall. You can also override this behavior and examine each instruction and return a boolean indicating whether it is or isn't a safe point.
>> 
>> Currently I only have function calls as safe points, although I may eventually enable loops as well. As far as leaf functions go, consider that the call to allocate memory is also a safe point - and if a function doesn't allocate any memory then we don't care if the GC is involved or not.
>> 
>> One complication with the current scheme is that the frontend has to have a sense of where the safe points are going to be. Because the current scheme requires the frontend to insert additional loads and stores around safe points (for spilling register values to memory so they can be traced), the frontend has to be able to guess which function call might be a safe point - but it can't know for sure due to the fact that optimization and inlining (which happens much later) may cause the removal of the actual call instruction. The safe but inefficient approach is to insert the extra loads and stores around every call instruction.
>> 
>> Peter Lawrence.
>> 
>> -- 
>> -- Talin
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110720/c2e6139b/attachment.html>