[PATCH] Statepoint infrastructure for garbage collection

Wed Oct 15 14:52:15 PDT 2014

Kevin,

Let me try to answer the point you're getting at.  In doing so, I want 
to explicitly separate the statepoint intrinsics which are currently up 
for review, and the future late safepoint placement. The statepoint 
intrinsics have value separate from the late safepoint placement 
approach, and I want to justify them on their own merits.

The basic problem we're trying to solve with these intrinsics is 
supporting fully relocating collectors.  By definition, such a collector 
needs to be precise w.r.t. root tracking.  Even worse, we need to ensure 
that *all copies* of a pointer are updated.  It is not acceptable to 
make two copies of a pointer, update one of them, then use the other for 
a memory access.

If the compiler is allowed to introduce derived pointers (i.e. pointer 
valued temporaries created by the compiler which point somewhere within 
an object, or outside it, but associated with it), we also need to track 
which *object* each *pointer* to be updated is associated with.  This is 
required to safely update the pointers.

For the sake of argument, let's say our frontend does safepoint insertion.

There's a couple of approaches which seem like they might work, let's 
explore each in turn:
- We could use patchpoints to record all the values needed for the GC 
stack map.  This mostly works, but requires that the patchpoint not be 
marked readonly or readnone (to prevent illegal reorderings).  That 
could be a usage convention.  The real problem is that the compiler is 
still free to introduce multiple *copies* of an SSA value over the 
patchpoint.  (This is completely legal under SSA semantics.)  When it 
does so, it creates a situation where the gc could fail to update a 
pointer which will then be dereferenced. That's a bug.  Worth stating 
explicitly, I believe the patchpoint scheme would be sufficient *if you 
do not every relocate a root*.
- We could use the gc.root.  gc.root defines the allocs, but does not 
define the call format, or any of the mechanisms to ensure proper 
relocation.  As such, it *by itself* is not viable.  Also, gc.root 
inherently assumes every value will have a stack slot. Without *heavy* 
reengineering, there's no way to have a gc pointer in a callee saved 
register over a call site.  This is an unfortunate limitation.  Any call 
representation without explicit relocation suffers from the same bug as 
the patchpoint scheme.
- We could combine gc.root allocas and patchpoints.  This essentially 
combines the flaws (no gc pointers in callee saved registers over calls, 
and missed copies), with no benefit.

The statepoint intrinsics are basically the patchpoint option above, but 
with relocation made explicit in the IR.  While it's still legal for the 
optimizer to create a copy of the value feeding a statepoint, that's now 
okay.  By construction, there can be no use of the original SSA value 
(and thus the copy) after the statepoint. Instead, the explicitly 
relocated value is used.

To summarize: We need (something like) statepoints for correctness of 
fully relocating collectors.

(The points I'm making here are somewhat subtle.  If it would help to 
have IR examples here, ask.  I'm deferring writing them because it's 
time consuming.)

Other advantages of the statepoint approach:

The gc.relocate intrinsics (part of the statepoint proposal) also makes 
it explicit in the IR what the base object of each pointer to be 
relocated is.  This isn't *required* (you could encode the same 
information in the arguments of the statepoint), but making it explicit 
is much cleaner.

The explicit relocation notation has the potential to be extended in to 
the backend.  With some register allocator changes (not part of this 
patch!), we could support gc pointers in callee saved registers.  This 
is possible with the (incorrect) patchpoint scheme.  It is possible, but 
*hard*, with the gc.root scheme.

The posted patch includes a couple of small optimizations (i.e. null 
forwarding) that help performance, but could (probably) be implemented 
on top of another scheme.  We have a number of planned optimizations on 
the statepoint mechanism.

Now, let me finally bring up late safepoint placement. The only real 
impact on this patch is that, to date, we have only focused on the 
*correctness* of a statepoint passing through the optimizer.  We have 
not attempted to teach the optimizer about how to leverage one or 
perform optimizations over one.  There's room for improvement here (i.e. 
not completely blocking inlining), but we prefer to approach this 
problem by simply inserting them late.   You could instead choose to 
insert them at generation time, and teach the optimizer about their 
semantics.  That *strategy choice* is independent of the representation 
choosen provided that representation is *correct*.

Yours,
Philip

On 10/14/2014 07:01 PM, Kevin Modzelewski wrote:
> I think a change like this might be more compelling if you could give more detail on how it would actually help (I can't find the detail I'm looking for in your blog posts).  It seems like the value of this patch is that it will work with late safepoint placement, but it'd be nice to see some examples of cases where late safepoint placement gives you something that early safepoint placement (ie by the frontend) doesn't.  It kind of feels like either approach will work well with only non-gc values, and neither approach will be able to do much optimization when you do function calls.  I'm not trying to claim that that's necessarily true, but it'd be easier to understand your point if there was some example IR.
>
> http://reviews.llvm.org/D5683
>
>