[LLVMdev] Future plans for GC in LLVM

Thu Dec 4 17:50:14 PST 2014

Now that the statepoint changes have landed, I wanted to start a 
discussion about what's next for GC support in LLVM.  I'm going to 
sketch out a strawman proposal, but I'm not set on any of this.  I 
mostly just want to draw interested parties out of the woodwork.  :)

Overall Direction:
In the short term, my intent is to preserve the functionality of the 
existing code, but migrate towards a position where the gcroot specific 
pieces are optional and well separated.  I also plan to start updating 
the documentation to reflect a separation between the general support 
for garbage collection (function attributes, identifying references, 
load and store barrier lowering, generating stack maps) and the 
implementation choices (gcroot & it's lowering vs statepoints & addr 
spaces for identifying references).

Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering 
code and in tree GCStrategies unless an interesting party speaks up.  I 
have no problem with retaining some of the existing pieces for legacy 
support or helping users to migrate, but as of right now, I don't know 
of any such active users.  The only exception to this might be the 
shadow stack GC.  Eventually in this context is at least six months from 
now, but likely less than 18 months.  Hopefully, that's vague enough.  :)

HELP - If anyone knows which Ocaml implementation and which Erlang 
implementation triggered the in tree GC strategies, please let me know!

Near Term Changes:
- Migrate ownership of GCStrategy objects from GCModuleInfo to 
LLVMContext.  In theory, this looses the ability for two different 
Modules to have the same collector with different state, but I know of 
no use case for this.
- Modify the primary Function::getGC/setGC interface to return a 
reference the GCStrategy object, not a string.  I will provide a 
Function::setGCString and getGCString.
- Extend the GCStrategy class to include a notion of which compilation 
strategy is being used.  The two choices right now will be Legacy and 
Statepoint.  (Longer term, this will likely become a more fine grained 
choice.)
- Separate GCStategy and related pieces from the 
GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, 
this will simply mean clarifying documentation and rearranging code a bit.
- Document/clarify the callbacks used to customize the lowering. Decide 
which of these make sense to preserve and document.

(Lest anyone get the wrong idea, the above changes are intended to be 
minor cleanup.  I'm not looking to do anything controversial yet.)

Questions:
- Is proving the ability to generate a custom binary stack map format a 
valuable feature?  Adapting the new statepoint infrastructure to work 
with the existing GCMetadataPrinter classes wouldn't be particularly hard.
- Are there any GCs out there that need gcroot's single stack slot per 
value implementation?   By default, statepoints may generate a different 
stackmap for every safepoint in a function.
- Is using gcroot and allocas to mark pointers as garbage collected 
references valuable?  (As opposed to using address spaces on the SSA 
values themselves?)  Long term, should we retain the gcroot marker 
intrinsics at all?

Philip

Appendix: The Current Implementations Key Classes:

GCStrategy - Provides a configurable description of the collector. The 
strategy can also override parts of the default GC root lowering 
strategy.  The concept of such a collector description is very valuable, 
but the current implementation could use some cleanup.  In particular, 
the custom lowering hooks are a bit of a mess.

GCMetadataPrinter - Provides a means to dump a custom binary format 
describing each functions safepoints.  All safepoints in a function must 
share a single root Value to stack slot mapping.

GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved 
to enable GCMetadataPrinter.