[LLVMdev] Proposal for improving llvm.gcroot (summarized)

Wed Mar 30 11:08:39 PDT 2011

(This is a summary of the big long thread on llvm.gcroot, for those who
didn't have time to read it.)

I'm proposing the replacement of llvm.gcroot() with three new intrinsics:

   - *llvm.gc.declare*(alloca, meta). This intrinsic marks an alloca as a
   garbage collection root. It can occur anywhere within a function, and lasts
   either until the end of the function, or a until matching call to
   llvm.gc.undeclare().
   - *llvm.gc.undeclare*(alloca). This intrinsic unmarks and alloca, so that
   it is no longer considered a root from that point onward.
   - *llvm.gc.value*(value, meta). This intrinsic marks an SSA value as a
   root. The SSA value can be any type, not necessarily a pointer. This marking
   lasts for the lifetime of the SSA value.

The names of the intrinsics are intended to follow the naming convention for
declaring debug variables (llvm.dbg.declare and llvm.dbg.value).

The llvm.gc.declare() and llvm.gc.value() intrinsics do essentially the same
thing: At each safe point, they make the first argument available to the GC
strategy as a pointer, using whatever means is most efficient from a code
generation standpoint. In the case of llvm.gc.declare(), which takes an
alloca as it's first argument, this is the same as llvm.gcroot() does now,
and is fairly straightforward: The GC strategy gets a reference to the value
argument.

In the case of llvm.gc.value(), providing a pointer to the GC strategy is
more involved, since the value may be in a register or split across several
registers. In some cases, it may be required to spill the value into memory
during safe points, and re-load it afterwards. In many cases, calling a
function will require saving the SSA value on the stack regardless, so it
may be possible to determine a pointer to that stack location.

The llvm.undeclare() intrinsic is used to indicate the end of the lifetime
of an alloca root. This replaces the current convention of assigning NULL to
a root to indicate the end of it's lifetime. This has two advantages: First,
it avoids the extra store, and second, it allows the backend code generator
to re-use the same stack slots for different roots, as long as their
lifetimes don't overlap. (Under the current scheme, the lifetime of a root
is required to be the whole function body.)

In all cases, LLVM should not make any assumptions about the type of the
value argument with respect to garbage collection, and should treat it as a
black box to the extent possible. The value may or may not contain pointers,
and it may or may not contain non-pointer fields. It will be up the the GC
strategy to take the appropriate action based on the data type and the meta
argument.

One open issue is whether formal function arguments - which are normally
treated as SSA values - can be passed as arguments to llvm.gc.value(). From
the standpoint of a user, this would be very convenient to have, but if it's
too difficult, then it can be worked around by copying the function
parameters to local SSA values.

Now, I realize that there were several strong supporters of a competing
proposal involving using the address-space field of pointers in LLVM. I
won't go into the details here, except to say two things (1) I believe that
approach limits the generality of LLVM's support for diverse collectors, and
(2) in the original thread, the folks who supported my proposal tended to be
people who were actual users of the current system, or who planned on using
it in the near future.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110330/273fd4fe/attachment.html>