[LLVMdev] Future plans for GC in LLVM

Tue Dec 9 11:56:57 PST 2014

> - From the documentation it looks like you're using the patchpoint
>   stackmap format
>   (http://llvm.org/docs/StackMaps.html#stackmap-format).  In that
>   format you can describe register locations - but from the overview
>   (http://llvm.org/docs/Statepoints.html#overview) it implies that all
>   gc pointers are spilled to the stack.  Is the spilling to memory
>   required?  Or is the plan to allow gc pointers to reside in register
>   as well.  (I'm hoping that a store/load at safepoinsts won't be
>   required and that they can stack register resident)

We're currently spilling to stack to keep the implementation simple.
Ideally we should be able to lower the complete gc.statepoint
construct to a no-op; and have the GC deal with whatever decision the
register allocator made.

> - I'm still fuzzy how code motion is blocked from moving SSA uses past
>   the safepoint once they've been inserted?  I'm likely just missing
>   some invariant in LLVM or the design since I can't seem to noodle it
>   out from what I've seen.

The representation only prevents "observable" uses of the GC pointers
from being moved across safepoints.  The semantics of gc.statepoint is
not that all uses of `%ptr' automatically become uses of the latest,
most relocated value of the object `%ptr' points to; but that the
gc.statepoint *explicitly* returns a `%ptr.reloc' that you're supposed
to use instead of `%ptr' once you've dynamically passed the
gc.statepoint.  Ensuring this can involve inserting phi nodes.

So, the two following pieces of code are semantically equivalent (in
pseudo-llvm):

  %cmp = (%ptr == null)
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)

Vs.

  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  %cmp = (%ptr == null)

In both the code segments, `%cmp' holds true if the *unrelocated* %ptr
is null.  In both the code segments, nothing looks at where %ptr was
relocated to.

Since gc.statepoint is specified to possibly have arbitrary
side-effects and can read/write arbitrary memory, the following two
are *not* equivalent:

  %cmp = (%ptr == null)
  if (%cmp) *global = 42;
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)

Vs.

  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  %cmp = (%ptr == null)
  if (%cmp) *global = 42;

but the second one is equivalent to

  %cmp = (%ptr == null)
  tok = statepoint(relocate %ptr)
  %ptr.reloc = relocate(tok, %ptr)
  if (%cmp) *global = 42;

> - In the CLR GC we don't require the base object pointer to be kept
>   alive for a derived managed pointer (interior pointer) but in your
>   design there is the requirement to maintain a base, derived pairing.
>   (If I remember right this is a Java requirement) Is this a hard
>   requirement?  Or is there the potential for other collectors to deal
>   just with managed pointers

I'm not familiar with the CLR GC, won't you need base pointers to
be able to relocate stack roots?  In any case, if you don't need
derived pointers, you can just have the identity map as the
base-derived relationship (i.e. every pointer is a base pointer for
itself).

-- Sanjoy