[LLVMdev] Improving Garbage Collection
viridia at gmail.com
Wed Jul 20 12:39:35 PDT 2011
On Wed, Jul 20, 2011 at 11:20 AM, Andrew Trick <atrick at apple.com> wrote:
> On Jul 19, 2011, at 10:06 AM, Peter Lawrence wrote:
> how about having the front-end generate an llvm.safe.point()
> intrinsic call at
> the desired safe points, and having the addresses of the GC roots (at that
> can vary from call to call) be the parameters (with noescape attribute) to
> the intrinsic,
> IIUC currently the GC roots are tagged, and all analysis and transform
> have to special case these tagged objects, but if instead their addresses
> were taken at
> the safe points no special casing would have to be done -- all analysis and
> optimizations already know how to deal with objects whose address is taken,
> and since llvm does already have a "noescape" (not sure that's the correct
> attribute for parameters, these addresses won't be misinterpreted by any
> analysis either, and llvm is free to go ahead and keep these values in
> between safe points -- you can stop asking how to allow GC roots as SSA
> any traditional load-store optimization pass will do it for you for free.
> *** without you having to insert explicit load and store instructions, and
> having to
> somehow mark them as non-delete-able, or always omit optimization passes
> you would otherwise like to have enabled ***
> and also without you having to store NULL into a safe point to end its
> and I would suggest eliminating the gcroot() intrinsic as it's information
> would be redundant.
> thoughts, comments ???
> -Peter Lawrence.
> It is helpful to think of the stack/register map generation in two
> phases: before and after identifying the location of heap
> pointers. Say PrePtrMap and PostPtrMap. We don't need to know stack
> offsets and register names at that point, but do we need a 1-1 mapping
> from pointer values to identifiable physical locations. I deliberately
> avoid calling these values roots here, because we may have multiple
> live pointers derived from an object and multiple copies of the same
> pointer. GC only needs to see one of these values to trace roots, but
> each of these still needs its own entry in the stack/register map for
> a moving collector.
> PrePtrMap, we need a type system to precisely tag all values known to
> contain valid heap pointers. No potentially uninitialized/undefined
> values allowed here. IntPtr/PtrInt casts become control dependent.
> PostPtrMap, we need an IR that represents mapped pointers as live-in
> to safepoints, *and* safepoints need to be defined as clobbering all
> locations that may contain a pointer. For example, if we have a
> register map, then we can no longer move and add instruction across a
> call if it may operate on a pointer. Obviously, it's easier to avoid
> invalidating a stack map, but fundamentally the same problem. No
> amount of spilling can bail you out without updating the map.
> The current gcroot solution works around the lack of LLVM type system
> support for heap pointers by effectively mapping pointers very early
> (in the front end), reloading roots after safepoints (so we only need
> one map entry per root), and relying on the rules that allows callees
> to write their caller's stack under certain circumstances (someone
> needs to explain these rules to me--is it only possible when an alloca
> pointer is taken?).
> Your proposal is exactly the same in terms of when the heap pointers
> are identified. That leaves all of the LLVM optimizer and codegen
> running in PostPtrMap mode. The problem is that LLVM is free to make
> copies of pointers and optimize across call sites without knowing how to
> update the map.
> The most efficient way to support GC is to move identification of
> pointer locations as late as possible. Optimization across safepoints
> needs to be effectively disabled after that point.
Achievement unlocked: People who are smarter than me and more knowledgeable
about the LLVM backends are arguing over the details of how GC ought to work
> On Jul 18, 2011, at 10:53 PM, Talin wrote:
> On Mon, Jul 18, 2011 at 11:00 AM, Peter Lawrence <
> peterl95124 at sbcglobal.net> wrote:
>> do you identify safe-points in the current or proposed llvm
>> scheme, and if so how,
>> or are they implicit as being at all call sites (which begs the question
>> what about leaves
>> in the call tree, how does GC get started at all in that case).
>> The LLVM linker has a feature where you can specify what kind of safe
> points your collector requires - the options are Loop, Return, PreCall and
> PostCall. You can also override this behavior and examine each instruction
> and return a boolean indicating whether it is or isn't a safe point.
> Currently I only have function calls as safe points, although I may
> eventually enable loops as well. As far as leaf functions go, consider that
> the call to allocate memory is also a safe point - and if a function doesn't
> allocate any memory then we don't care if the GC is involved or not.
> One complication with the current scheme is that the frontend has to have a
> sense of where the safe points are going to be. Because the current scheme
> requires the frontend to insert additional loads and stores around safe
> points (for spilling register values to memory so they can be traced), the
> frontend has to be able to guess which function call might be a safe point -
> but it can't know for sure due to the fact that optimization and inlining
> (which happens much later) may cause the removal of the actual call
> instruction. The safe but inefficient approach is to insert the extra loads
> and stores around every call instruction.
> Peter Lawrence.
> -- Talin
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev