[PATCH] Statepoint infrastructure for garbage collection

Thu Oct 23 23:01:02 PDT 2014

> On Oct 8, 2014, at 2:24 PM, Philip Reames <listmail at philipreames.com> wrote:
> 
> Hi hfinkel, chandlerc, nicholas, sanjoy, atrick, ributzka, theraven,
> 
> The attached patch implements an approach to supporting garbage collection in LLVM that has been mentioned on the mailing list a number of times by now.  There's a couple of issues that need to be addressed before submission, but I wanted to get this up to give maximal time for review. 

This is really awesome work. It will be great to have solid and efficient design for anyone wanting precise GC with LLVM compiled code.

> The statepoint intrinsics are intended to enable precise root tracking through the compiler as to support garbage collectors of all types.  Our testing to date has focused on fully relocating collectors (where pointers can change at any safepoint poll, or call site), but the infrastructure should support collectors of other styles.  The addition of the statepoint intrinsics to LLVM should have no impact on the compilation of any program which does not contain them.  There are no side tables created, no extra metadata, and no inhibited optimizations. 

Soon we should promote llvm.experimental.patchpoint to a first-class intrinsic, so I want to know if there's something better we can do to handle your use case.

There are a variety of use-cases for patchpoint-like intrinsics. They will each use or ignore some subset of the operands. I realize now that the most sane approach is to define one well-supported intrinsic and use call attributes to denote subtle difference in semantics. For example, what if someone wants to implement polymorphic inline caches at call sites with precise GC? They are going to need a single intrinsic that does everything patchpoint and statepoint do. Two separate intrinsics would not give them valid semantics, unless they were somehow tied together and lowered as one instruction.

I understand that you don't want statepoints to carry patching-related cruft. But it's really just one constant operand, numbytes, that you set to zero!

One problem is how you lower the gc_result. However, it should be possible to detect during lowering that patchpoint is used by gc_result and react accordingly.

If there are other difficulties in how you custom lower, for example tracking pointer relocations, we could add a call attribute to identify the patchpoint as having statepoint semantics. We might also need that attribute to force frame index operands to be volatile stores.

It's certainly nice to see the distinction between deopt and gc args, but is it essential for any LLVM passes to make that distinction downstream from the safepoint insertion pass?

The only issue I can think of us generating stackmap records. But this really surprises me:

+ The ID field of the 'StkMapRecord' for a statepoint is meaningless and it's value is
+ explicitly unspecified.

I envision stack maps working as follows:

(1) LLVM emits a "raw" stackmap record.

(2) The JIT parses and compresses each record, using the ID to map to its own metadata describing the stackmap location. For example, LLVM can emit the deopt and GC parameters as a single list of location. The JIT knows the number of deopt parameters for this ID, and can use the correct encoding for the remaining GC pairs.

> 
> A statepoint works by transforming a call site (or safepoint poll site) into an explicit relocation operation.  It is the frontend's responsibility (or eventually the safepoint insertion pass we've developed, but that's not part of this patch) to ensure that any live pointer to a GC object is correctly added to the statepoint and explicitly relocated.  The relocated value is just a normal SSA value (as seen by the optimizer), so merges of relocated and unrelocated values are just normal phis.  The explicit relocation operation, the fact the statepoint is assumed to clobber all memory, and the optimizers standard semantics ensure that the relocations flow through IR optimizations correctly. 
> 
> During the lowering process, we currently spill aggressively to stack.  This is not entirely ideal (and we have plans to do better), but it's functional, relatively straight forward, and matches closely the implementations of the patchpoint intrinsics.  We leverage the existing StackMap section format, which is already used by the patchpoint intrinsics, to report where pointer values live.  Unlike a patchpoint, these locations are known (by the backend) to be writeable during the call.  This enables the garbage collector to transparently read and update pointer values if required.  We do optimize lowering in certain well known cases (constant pointers, a.k.a. null, being the key one.)

It's very sad that you force spilling. That somewhat defeats the purpose of patch points and stack maps. Is that because you don't have a way to lower the statepoint such that it defines all the virtual registers that hold pointer values? That seems fixable with the right machine operand flags.

Is there also an issue with your runtime restoring callee saves when unwinding? This is unfortunately a platform-specific issue that LLVM JITs currently face. You may have solved it given you're only targetting one platform. Please bring it up in the MCJIT BOF Tuesday if it's an issue for you.

> There are a few areas of this patch which could use improvement:
> - The patch needs rebased against TOT.  It's currently based against a roughly 3 week old snapshot.  
> - The intrinsics should probably be renamed to include an "experimental" prefix. 
> - The usage of Direct and Indirect location types are currently inverted as compared to the definition used by patchpoint.  This is a simple fix.
> - The test coverage could be improved.  Most of the tests we've actually been using are built on top of the safepoint insertion mechanism (not included here) and our runtime.  We need to improve the IR level tests for optimizer semantics (i.e. not doing illegal transforms), and lowering.  There are some minimal tests in place for the lowering of simple statepoints.   
> - The documentation is "in progress" (to put it kindly.)
> - Many functions are missing doxygen comments
> - There's a hack in to force the use of RSP+Offset addressing vs RBP-Offset addressing for references in the StackMap section.  This works, shouldn't break anyone else, but should definitely be cleaned up.  The choice of addressing preference should be up to the runtime. 
> 
> When reviewing, I would greatly appreciate feedback on which issues need to be fixed before submission and those which can be addressed afterwards.  It is my plan to actively maintain and enhance this infrastructure over next few months (and years).  It's already been developed out of tree entirely too long (our fault!), and I'd like to move to incremental work in tree as quickly as feasible. 

This isn't a proper review, but I do have one comment on the code. There is an awful lot of custom code in SelectionDAGBuilder (600 lines). Very few LLVM devs will want to see this, so I suggest finding a way to move it into another file even if it's not easy to do.

-Andy

> 
> Planned enhancements after submission:
> - The ordering of arguments in statepoints is essentially historical cruft at this point.  I'm open to suggestions on how to make this more approachable.  Reordering arguments would (preferably) be a post commit action.
> - Support for relocatable pointers in callee saved registers over call sites.  This will require the notation of an explicit relocation psuedo op and support for it throughout the backend (particularly the register allocator.)
> - Optimizations for non-relocating collectors.  For example, the clobber semantics of the spill slots aren't needed if the collector isn't relocating roots. 
> - Further optimizations to reduce the cost of spilling around each statepoint (when required at all). 
> - Support for invokable statepoints. 
> - Once this has baked in tree for a while, I plan to delete the existing gc_root code.  It is unsound, and essentially unused.
> 
> In addition to the enhancements to the infrastructure in the currently proposed patch, we're also working on a number of follow up changes:
> - Verification passes to confirm that safepoints were inserted in a semantically valid way (i.e. no memory access of a value after it has been inserted)
> - A transformation pass to convert naive IR to include both safepoint polling sites, and statepoints on every non-leaf call.  This transformation pass can be used at initial IR creation time to simplify the frontend authors' work, but is also designed to run on *fully optimized* IR, provided the initial IR meets certain (fairly loose) restrictions. 
> - A transformation pass to convert normal loads and stores into user provided load and store barriers.
> - Further optimizations to reduce the number of safepoints required, and improve the infrastructure as a whole. 
> 
> We've been working on these topics for a while, but the follow on patches aren't quite as mature as what's being proposed now.  Once these pieces stabilize a bit, we plan to upstream them as well.  For those who are curious, our work on those topics is available here: https://github.com/AzulSystems/llvm-late-safepoint-placement
> 
> http://reviews.llvm.org/D5683
> 
> Files:
>  docs/Statepoints.rst
>  include/llvm/CodeGen/FunctionLoweringInfo.h
>  include/llvm/CodeGen/MachineInstr.h
>  include/llvm/CodeGen/StackMaps.h
>  include/llvm/IR/Intrinsics.td
>  include/llvm/IR/Statepoint.h
>  include/llvm/Target/Target.td
>  include/llvm/Target/TargetFrameLowering.h
>  include/llvm/Target/TargetOpcodes.h
>  lib/Analysis/TargetTransformInfo.cpp
>  lib/CodeGen/InlineSpiller.cpp
>  lib/CodeGen/LocalStackSlotAllocation.cpp
>  lib/CodeGen/PrologEpilogInserter.cpp
>  lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
>  lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
>  lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
>  lib/CodeGen/StackMaps.cpp
>  lib/CodeGen/TargetLoweringBase.cpp
>  lib/IR/CMakeLists.txt
>  lib/IR/Function.cpp
>  lib/IR/LLVMContext.cpp
>  lib/IR/Statepoint.cpp
>  lib/IR/Verifier.cpp
>  lib/Target/X86/X86FrameLowering.cpp
>  lib/Target/X86/X86FrameLowering.h
>  lib/Target/X86/X86ISelLowering.cpp
>  lib/Target/X86/X86MCInstLower.cpp
>  lib/Transforms/InstCombine/InstCombineCalls.cpp
>  test/CodeGen/X86/statepoint-call-lowering.ll
>  test/CodeGen/X86/statepoint-stack-usage.ll
>  test/CodeGen/X86/statepoint-stackmap-format.ll
>  test/Verifier/statepoint-non-gc-ptr.ll
>  test/Verifier/statepoint.ll
>  utils/TableGen/CodeGenTarget.cpp
> <D5683.14604.patch>