[LLVMdev] Future plans for GC in LLVM

Tue Dec 9 16:27:56 PST 2014

On 12/09/2014 03:12 AM, Ben Karel wrote:
> Hello Philip,
>
> I am an active user of LLVM's GC infrastructure. Here are some notes 
> on how I currently use gcroot():
Wow, thank you for sharing!  Your usage is far beyond anything else I've 
heard of.  Can I ask which project/language this is?  Or is that 
proprietary?
>
> 1) I use several IRs. A high-level (CPS/SSA hybrid) IR is the target 
> of inlining and other such high-level optimization; most allocation is 
> implicit in the high level IR. Closure conversion/lambda-lifting 
> produces a variant of that IR, extended with explicit allocation 
> constructs. A separate pass introduces the allocations themselves.
> 2) Next, a dataflow-based root insertion pass inserts the minimal(ish) 
> set of roots and root reloads to preserve the invariant that every 
> live GCable pointer is in a root at each potential GC point. The root 
> insertion pass also computes liveness and uses it to minimize the set 
> of roots via slot coloring. The result of root insertion is (very 
> close to) LLVM IR.
I would be really interested in hearing more about your implementation 
for this part.  We need to solve a similar problem in the statepoint 
lowering code.  What we have to date is a simply greedy scheme that 
works reasonable well in practice, but leaves lots of room for improvement.
> 3) Potential GC points are determined inter-procedurally, and call 
> sites which are statically known to (transitively) not GC do not break 
> GCable pointer live ranges.
I was planning something like this for LLVM at some point.  It's 
relatively low on my priority list since IPO tends to be of minimal 
interest when JITing which is my primary use case.
> 4) I use a lightly-modified variant of the OCaml plugin's stackmap format.
> 5) I don't currently use load or store barriers, but do eventually 
> plan to.
> 6) I do support GCing through global variables.
> 7) I wrote a custom LLVM pass to verify that values loaded from GC 
> roots aren't used across GC points.
Cool.  We have a similar one for statepoints.  (If you can share, 
getting both into the common tree would be nice.)
>
> When I first started using gcroot(), I used the naïve approach of 
> reloading roots after every call, and encountered significant 
> overhead. The optimizations sketched above significantly reduced that 
> overhead.  Unfortunately, I don't know of any meaningful 
> language-independent benchmark suites for measuring that sort of 
> overhead, and of course the overhead on any given program will 
> strongly depend on details of how that program is written...
>
> Also, FWIW when I first started, I got up and running with the shadow 
> stack, just as Gordon described, before transitioning to the "real" 
> infrastructure. As long as it doesn't carry a significant burden on 
> the implementation side, I think it's worth having, because it 
> significantly improves the learning curve for new users.
I'm planning on retaining the shadow stack mechanism.
>
> OK, direct answers to some of your questions:
>
> * I think custom stack map formats have small but non-zero value. 
> There have been a few papers (most in the early 90's, I think) which 
> showed that stack maps can make up a non-trivial fraction of a 
> binary's size, and thus are increasingly desirable to optimize as 
> programs grow larger. My verdict: if support for custom formats are 
> ever actively impeding forward progress, toss 'em; otherwise, there 
> should (eventually) be a more detailed look at the costs and benefits.
My preferred usage model would be: LLVM generates standard format, 
runtime parses and saves in custom binary format.

Having said that, retaining the capability doesn't seem to involve too 
much complexity.  I see no reason to kill it since multiple folks seem 
to want it.
>
> * As above, I think the primary benefit of one-stackmap-per-safepoint 
> is saving space at the cost of time (and root traffic). AFAIK the 
> primary virtue of the one-stack-slot-per-gcroot() implementation is 
> implementation simplicity, nothing more.
>
> * The ultimate strength & weakness of gcroot() is that basically 
> everything is left to the frontend. There's very little unavoidable 
> overhead imposed on a frontend that is willing to go the distance to 
> generate non-naïve code -- any knowledge that the frontend has can be 
> used to generate better code. Unfortunately, this also places a rather 
> heavy burden on the frontend to generate valid & efficient code.
>
>
> Since I haven't had the chance to look beyond mailing list & blog 
> posts on statepoints, I can't comment much on their tradeoffs vs 
> gcroots. The high-level impression I get is that statepoints will 
> allow a less-sophisticated frontend to get better results than naïve 
> usage of gcroot(). It also looks like statepoints will do a better job 
> of steering frontends away from the pitfalls of using GC-invalidated 
> pointer values.  I have no idea whether there will be any codegen 
> benefit for more sophisticated frontends, but I'll be happy to port my 
> implementation to statepoints once they've gotten a chance to settle 
> down somewhat, and provide more detailed feedback to help determine 
> what to eventually do with gcroot(). I can currently only do ad-hoc 
> benchmarking, but hopefully by the time gcroot() is on the chopping 
> block, I'll have a more extensive automated performance regression 
> test suite ;-)
I'll be very interested in your results.  I'm quite sure you'll find 
stability and performance bugs.  The existing code 'works' but has 
really only been hammered in one particular use case (source language, 
virtual machine).  Every new consumer will help to make the code more 
robust.  Let me know when you're ready to start prototyping this.  I'm 
happy to answer questions and help guide you around any bugs you might 
find.

I suspect from what you've said above that we might want to extend the 
mechanism slightly to allow you to retain your preassigned stack slots.  
You may be getting better spilling code than the current implementation 
would give you by default.  This seems like it might be a generally 
useful mechanism.  We could either use a call attribute on the 
gc.relocate, a bit of metadata, or possible an optional third argument 
that specifies an 'abstract slot'.  It would depend on your initial 
results and how much we've managed to improve the lowering code by then.  :)
>
>
> One quick question: I think I understand why address spaces are needed 
> to distinguish GCable pointers for late safepoint placement, but are 
> they also needed if the frontend is inserting all safepoints itself?
Nope.  They might allow some additional sanity checks, but nothing that 
is currently checked in relies on address spaces in any way. My plan is 
to make the pointer distinction mechanism (addrspace, gcroot, vs custom) 
a property of the GCStrategy with easily control extension points.
>
>
> Finally, thank you for taking on the burden of improving LLVM's GC 
> functionality!
Thank you for sharing your experience!
>
>
> On Thu, Dec 4, 2014 at 8:50 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>     Now that the statepoint changes have landed, I wanted to start a
>     discussion about what's next for GC support in LLVM.  I'm going to
>     sketch out a strawman proposal, but I'm not set on any of this.  I
>     mostly just want to draw interested parties out of the woodwork.  :)
>
>     Overall Direction:
>     In the short term, my intent is to preserve the functionality of
>     the existing code, but migrate towards a position where the gcroot
>     specific pieces are optional and well separated.  I also plan to
>     start updating the documentation to reflect a separation between
>     the general support for garbage collection (function attributes,
>     identifying references, load and store barrier lowering,
>     generating stack maps) and the implementation choices (gcroot &
>     it's lowering vs statepoints & addr spaces for identifying
>     references).
>
>     Longer term, I plan to *EVENTUALLY DELETE* the existing gcroot
>     lowering code and in tree GCStrategies unless an interesting party
>     speaks up.  I have no problem with retaining some of the existing
>     pieces for legacy support or helping users to migrate, but as of
>     right now, I don't know of any such active users.  The only
>     exception to this might be the shadow stack GC.  Eventually in
>     this context is at least six months from now, but likely less than
>     18 months. Hopefully, that's vague enough.  :)
>
>     HELP - If anyone knows which Ocaml implementation and which Erlang
>     implementation triggered the in tree GC strategies, please let me
>     know!
>
>
>     Near Term Changes:
>     - Migrate ownership of GCStrategy objects from GCModuleInfo to
>     LLVMContext.  In theory, this looses the ability for two different
>     Modules to have the same collector with different state, but I
>     know of no use case for this.
>     - Modify the primary Function::getGC/setGC interface to return a
>     reference the GCStrategy object, not a string.  I will provide a
>     Function::setGCString and getGCString.
>     - Extend the GCStrategy class to include a notion of which
>     compilation strategy is being used.  The two choices right now
>     will be Legacy and Statepoint.  (Longer term, this will likely
>     become a more fine grained choice.)
>     - Separate GCStategy and related pieces from the
>     GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering code.  At
>     first, this will simply mean clarifying documentation and
>     rearranging code a bit.
>     - Document/clarify the callbacks used to customize the lowering.
>     Decide which of these make sense to preserve and document.
>
>     (Lest anyone get the wrong idea, the above changes are intended to
>     be minor cleanup.  I'm not looking to do anything controversial yet.)
>
>     Questions:
>     - Is proving the ability to generate a custom binary stack map
>     format a valuable feature?  Adapting the new statepoint
>     infrastructure to work with the existing GCMetadataPrinter classes
>     wouldn't be particularly hard.
>     - Are there any GCs out there that need gcroot's single stack slot
>     per value implementation?   By default, statepoints may generate a
>     different stackmap for every safepoint in a function.
>     - Is using gcroot and allocas to mark pointers as garbage
>     collected references valuable?  (As opposed to using address
>     spaces on the SSA values themselves?)  Long term, should we retain
>     the gcroot marker intrinsics at all?
>
>
>     Philip
>
>     Appendix: The Current Implementations Key Classes:
>
>     GCStrategy - Provides a configurable description of the collector.
>     The strategy can also override parts of the default GC root
>     lowering strategy.  The concept of such a collector description is
>     very valuable, but the current implementation could use some
>     cleanup.  In particular, the custom lowering hooks are a bit of a
>     mess.
>
>     GCMetadataPrinter - Provides a means to dump a custom binary
>     format describing each functions safepoints.  All safepoints in a
>     function must share a single root Value to stack slot mapping.
>
>     GCModuleInfo/GCFunctionInfo - These contain the metadata which is
>     saved to enable GCMetadataPrinter.
>
>     _______________________________________________
>     LLVM Developers mailing list
>     LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>     http://llvm.cs.uiuc.edu
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141209/2934f16b/attachment.html>