[LLVMdev] Future plans for GC in LLVM

Thu Dec 4 19:38:13 PST 2014

Excellent direction. My input, although I'm verrrrry far removed from the project at this point:

• I vehemently support replacement of the gcroot intrinsic. It makes the system make unnecessarily conservative decisions by default, which is incompatible with producing a high quality compiler for garbage collected languages. I  just never got ambitious enough to replace it.
• The Ocaml-specific code can be jettisoned. Ocaml was hostile to contributors outside of France and INRIA in specific, so no published callers use it that I'm aware of. (The ocaml runtime also requires an alternative calling convention, so it's highly unlikely that there's a stray caller!)
• Shadow-stack is probably useful for bootstrapping up new runtimes/new languages because it's crazy simple to interoperate with. It's not threadsafe or anything tho, so don't let it stand in the way of a high quality implementation.
• Generating a different stack map at every safepoint is highly desirable. Anything that depends on the contrary deserves to be broken!
• It was my experience that porting a compiler was simplified by not concurrently porting the runtime. So the ability to customize serialization of the stack map was valuable, particularly if it is inexpensive to preserve that behavior. (If LLVM has gotten into the business of providing collector runtimes and memory allocators and …, then dump it!)
• Interior pointers pose a particular challenge once the gcroot intrinsic is removed. I was totally punting on that problem domain.

Psyched to see movement here. :)

- Gordon

P.S. Please keep tagged pointers in mind somehow. *coughcoughlispmachinecough*

> On Dec 4, 2014, at 20:50, Philip Reames <listmail at philipreames.com> wrote:
> 
> Now that the statepoint changes have landed, I wanted to start a discussion about what's next for GC support in LLVM.  I'm going to sketch out a strawman proposal, but I'm not set on any of this.  I mostly just want to draw interested parties out of the woodwork.  :)
> 
> Overall Direction:
> In the short term, my intent is to preserve the functionality of the existing code, but migrate towards a position where the gcroot specific pieces are optional and well separated.  I also plan to start updating the documentation to reflect a separation between the general support for garbage collection (function attributes, identifying references, load and store barrier lowering, generating stack maps) and the implementation choices (gcroot & it's lowering vs statepoints & addr spaces for identifying references).
> 
> Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot lowering code and in tree GCStrategies unless an interesting party speaks up.  I have no problem with retaining some of the existing pieces for legacy support or helping users to migrate, but as of right now, I don't know of any such active users.  The only exception to this might be the shadow stack GC.  Eventually in this context is at least six months from now, but likely less than 18 months.  Hopefully, that's vague enough.  :)
> 
> HELP - If anyone knows which Ocaml implementation and which Erlang implementation triggered the in tree GC strategies, please let me know!
> 
> 
> Near Term Changes:
> - Migrate ownership of GCStrategy objects from GCModuleInfo to LLVMContext.  In theory, this looses the ability for two different Modules to have the same collector with different state, but I know of no use case for this.
> - Modify the primary Function::getGC/setGC interface to return a reference the GCStrategy object, not a string.  I will provide a Function::setGCString and getGCString.
> - Extend the GCStrategy class to include a notion of which compilation strategy is being used.  The two choices right now will be Legacy and Statepoint.  (Longer term, this will likely become a more fine grained choice.)
> - Separate GCStategy and related pieces from the GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At first, this will simply mean clarifying documentation and rearranging code a bit.
> - Document/clarify the callbacks used to customize the lowering. Decide which of these make sense to preserve and document.
> 
> (Lest anyone get the wrong idea, the above changes are intended to be minor cleanup.  I'm not looking to do anything controversial yet.)
> 
> Questions:
> - Is proving the ability to generate a custom binary stack map format a valuable feature?  Adapting the new statepoint infrastructure to work with the existing GCMetadataPrinter classes wouldn't be particularly hard.
> - Are there any GCs out there that need gcroot's single stack slot per value implementation?   By default, statepoints may generate a different stackmap for every safepoint in a function.
> - Is using gcroot and allocas to mark pointers as garbage collected references valuable?  (As opposed to using address spaces on the SSA values themselves?)  Long term, should we retain the gcroot marker intrinsics at all?
> 
> 
> Philip
> 
> Appendix: The Current Implementations Key Classes:
> 
> GCStrategy - Provides a configurable description of the collector. The strategy can also override parts of the default GC root lowering strategy.  The concept of such a collector description is very valuable, but the current implementation could use some cleanup.  In particular, the custom lowering hooks are a bit of a mess.
> 
> GCMetadataPrinter - Provides a means to dump a custom binary format describing each functions safepoints.  All safepoints in a function must share a single root Value to stack slot mapping.
> 
> GCModuleInfo/GCFunctionInfo - These contain the metadata which is saved to enable GCMetadataPrinter.
>