[LLVMdev] getting started with IR needing GC

Mon Apr 28 22:51:43 PDT 2008

On Mon, 2008-04-28 at 21:39 -0500, Lane Schwartz wrote:
> On Mon, Apr 28, 2008 at 8:31 PM, Gordon Henriksen
> <gordonhenriksen at mac.com> wrote:
> > On 2008-04-28, at 21:19, Lane Schwartz wrote:
> >
> >  > On Mon, Apr 28, 2008 at 2:13 PM, Gordon Henriksen <gordonhenriksen at mac.com
> >  >> stack and discover the return address of each call frame. LLVM
> >  >> doesn't provide such a facility.
> >  >
> >  > I guess I'm missing something here. Why does the GC need the return
> >  > addresses?
> >
> >
> >  Return addresses are directly discoverable from the stack; function
> >  entry points are not.
> 
> If you have the stack, and can therefore determine all live GC roots,
> why do you need function entry points? Is it so you know *when* you
> can safely run garbage collection?

Lane:

Having the stack map doesn't give you liveness information. If you know
where you are in the control flow of a procedure, you may be able to
know that certain pointer values in the corresponding stack frame are
live and others are not. Tracing non-live pointers can lead to fairly
dramatic increases in retained garbage. Partly for this reason there has
been a bunch of work done on optimization passes in GC'd languages to
explicitly nullify pointer values explicitly at any point where they
cease to be live. A flow-oblivious GC also causes procedure frame growth
because pointer temporaries and data temporaries cannot be stored in the
same location in the frame (because we need to know, for each location,
whether it is pointer or data). In a non-GC language, those could reuse
the same cell in the frame as long as the two values do not have
overlapping liveness.

More importantly, if you don't know where the procedure is in its
control flow, then you cannot know which callee-saved registers held
pointers at the time of the call, so you can't do any relocating GC if
pointers are permitted to be live in registers at a call site. This has
implications for register allocation. A naive-GC-friendly
registerization scheme, given a choice, should prefer to keep pointers
in caller-saved registers to avoid additional spill code. I don't know
whether the current LLVM register allocator does this (Gordon?). If
hardware register selection considers the load/save cost as part of its
registerization criteria (Gordon?), LLVM may end up solving this
implicitly as a side effect of considering the reload costs.

As I understand matters, LLVM does not presently provide sufficient
information to reconstruct register values as you walk the stack. This
implies (and here I *really* hope that Gordon will straighten me out)
that if your implementation uses a relocating GC strategy you need to
ensure in your IR that you reload any registerized pointer values after
any procedure call that might (transitively) call GC(). Conservatively
this tends to devolve into "reload any pointer after any procedure
call".

Caller-saved registers are not a real problem here, because the caller
can save them to local frame cells that are of the appropriate type
(pointer vs. data), though I do not know if the LLVM register
dump/reload support is pointer/data aware (Gordon?). The problem is
callee-saved registers. The callee has no type information about the
register's current value, and so cannot save it to an appropriately
typed save slot in the callee stack frame. Consider:

  A() has pointer in callee-saved register R17
  A() calls B()
  B() needs R17, so saves it to &R17. Does not know that R17 holds
      pointer.

GC walks stack frame for B(), cannot determine whether the location at
&R17 is pointer or data. Naive GC therefore cannot relocate R17. This
induces the requirement for A() to reload R17 after return from the call
to B().

The requirement to dump and reload these registers is awful, and it is
part of why Fergus Henderson's strategy for using an embedded shadow
stack doesn't look very bad in comparison to many "true" GC systems.

A hypothetical sophisticated stack walker with access to flow-sensitive
(i.e. per-return-PC) information could do better. As the hypothetical
walker proceeds up the stack, it remembers for each callee-saved
register the location of the currently live saved location for that
register. In my example, it would remember the location where R17 was
saved by B. At the time the location is saved the walker does not know
the type of that memory cell. As it proceeds further up the stack, it
discovers when it reaches A() that the currently live R17 is a pointer,
at which point it infers that the previously recorded save location must
be of pointer type (or at least, it is so for the control flow and stack
state that we are currently walking) and then goes back and traces the
saved memory location &R17 according to what is required by the
particular GC algorithm.

I'm not sure pragmatically how one might go about this in the LLVM
framework, or even if it is possible. The knowledge required by GC here
is actually the same as what you need for a debugger to walk the call
stack and keep track of register values, so it's not insane to imagine
that the compiler must be able to emit it, but I don't know how to get
emitted into a place that the stack walker is able to use.

Gordon: assume, for a moment, that I'm willing to go the extra several
miles to do this type if thing in my GC implementation. How might I go
about getting per-call-site register typings, register save locations,
and return-pc information extracted for use by this hypothetical
sophisticated walker?

shap