[LLVMdev] Future plans for GC in LLVM

Tue Dec 9 16:10:50 PST 2014

On 12/08/2014 04:02 PM, Russell Hadley wrote:
> Hi,
>
> This looks great.  I'm excited to see this come in and the direction looks right to me.  I've started to look at how to use LLVM to target the MS CLR GC (now being open sourced) and the statepoints are much closer to the current implementation so that could make my life easier.  On your questions, add my vote to the ability to implement a custom binary format for safepoints since it means I can just target the CLRs format.
It seems like there's a strong desire to preserve the custom binary 
format mechanism.  I wasn't really expecting that, but I see no real 
downsides other than some minor code complexity.

>   I've been trying to keep up with your progress but since I'm coming out of lurk mode I have a few questions:
>
> - From the documentation it looks like you're using the patchpoint stackmap format (http://llvm.org/docs/StackMaps.html#stackmap-format).  In that format you can describe register locations - but from the overview (http://llvm.org/docs/Statepoints.html#overview) it implies that all gc pointers are spilled to the stack.  Is the spilling to memory required?  Or is the plan to allow gc pointers to reside in register as well.  (I'm hoping that a store/load at safepoinsts won't be required and that they can stack register resident)
At the moment, we will eagerly spill and the stack map will only contain 
stack slots.  My hope is in the not too distant future to extend the 
backend infrastructure to allow accurate reporting of gc pointers in 
registers.  The format specification already supports this, we're just 
not able to actually exploit that in the backend yet.
> - I'm still fuzzy how code motion is blocked from moving SSA uses past the safepoint once they've been inserted?   I'm likely just missing some invariant in LLVM or the design since I can't seem to noodle it out from what I've seen.
I think Sanjoy's response did a pretty good job on this one.  If it's 
still unclear, let me know.
> - In the CLR GC we don't require the base object pointer to be kept alive for a derived managed pointer (interior pointer) but in your design there is  the requirement to maintain a base, derived pairing.  (If I remember right this is a Java requirement) Is this a hard requirement?  Or is there the potential for other collectors to deal just with managed pointers
I'm a little unclear on what you're trying to ask here.  A pointer 
associated with an object (say, the address of a field) must keep the 
object alive; doing otherwise would create use-after-free errors.  I'm 
guessing that you simply don't keep interior pointers live across a 
safepoint?  That's fine and everything should work normally.

If - at the safepoint - all of your pointers are base pointers, then you 
can simply list it for both the base and derived fields in the 
gc.relocate.  This mechanism definitely works; it's a pretty common case 
for any compiled code.

The other case you might be referring to is so-called 'contained 
objects'.  (i.e. one gc managed object embedded within another, but 
whose lifetimes are distinct)  This is a more complicated case, so 
unless this is actually what you're getting at, I'm going to avoid 
explaining all the complexities.  :)
>
> Thanks,
>
> -R
>
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Philip Reames
> Sent: Thursday, December 4, 2014 5:50 PM
> To: LLVM Developers Mailing List
> Cc: rayiner at gmail.com; gordonhenriksen at mac.com
> Subject: [LLVMdev] Future plans for GC in LLVM
>
> Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)
>
> Overall Direction:
> In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).
>
> Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)
>
> HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!
>
>
> Near Term Changes:
> - Migrate ownership of GCStrategy objects from GCModuleInfo to
> LLVMContext.  In theory, this looses the ability for two different
> Modules to have the same collector with different state, but I know of
> no use case for this.
> - Modify the primary Function::getGC/setGC interface to return a
> reference the GCStrategy object, not a string.  I will provide a
> Function::setGCString and getGCString.
> - Extend the GCStrategy class to include a notion of which compilation
> strategy is being used.  The two choices right now will be Legacy and
> Statepoint.  (Longer term, this will likely become a more fine grained
> choice.)
> - Separate GCStategy and related pieces from the
> GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first,
> this will simply mean clarifying documentation and rearranging code a bit.
> - Document/clarify the callbacks used to customize the lowering. Decide
> which of these make sense to preserve and document.
>
> (Lest anyone get the wrong idea, the above changes are intended to be
> minor cleanup.  I'm not looking to do anything controversial yet.)
>
> Questions:
> - Is proving the ability to generate a custom binary stack map format a
> valuable feature?  Adapting the new statepoint infrastructure to work
> with the existing GCMetadataPrinter classes wouldn't be particularly hard.
> - Are there any GCs out there that need gcroot's single stack slot per
> value implementation?   By default, statepoints may generate a different
> stackmap for every safepoint in a function.
> - Is using gcroot and allocas to mark pointers as garbage collected
> references valuable?  (As opposed to using address spaces on the SSA
> values themselves?)  Long term, should we retain the gcroot marker
> intrinsics at all?
>
>
> Philip
>
> Appendix: The Current Implementations Key Classes:
>
> GCStrategy - Provides a configurable description of the collector. The
> strategy can also override parts of the default GC root lowering
> strategy.  The concept of such a collector description is very valuable,
> but the current implementation could use some cleanup.  In particular,
> the custom lowering hooks are a bit of a mess.
>
> GCMetadataPrinter - Provides a means to dump a custom binary format
> describing each functions safepoints.  All safepoints in a function must
> share a single root Value to stack slot mapping.
>
> GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved
> to enable GCMetadataPrinter.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev