[LLVMdev] Improving Garbage Collection

Wed Jul 6 15:24:51 PDT 2011

*Motivation & Abstract*

It has been observed by a number of LLVM users that the current garbage
collection framework requires frontends to generate complex and inefficient
code. This is primarily due to the fact that only allocas can be stack roots
- it is not possible to trace SSA values directly. Because of this
limitation, a frontend wanting to support garbage collection has to manually
copy any SSA values containing potential roots to memory before each call
instruction (or other 'safe point'), and reload them back into SSA values
after the call instruction.

*Background: How the inability to trace SSA values degrades efficiency.*

Although it would be ideal if there was a way for registers to be traced
directly, in practice this is very difficult, for a number of reasons - for
one thing, it requires the collector to have knowledge of the register file
of the current processor. It is much simpler if all garbage collection roots
can be identified by an address in memory. Doing this requires that before
each safe point, all register values must be spilled to memory.

Although this may seem inefficient, in practice it need not be: since most
processors have a relatively small register file, chances are that only a
few local variables will be live in registers at any given point. Regardless
of how deep the stack is (that is, how many call frames there are and how
many local variables there are in each frame), the most you should ever need
to copy to memory is the entire register file - less if some of the
registers contain non-root values such as integers, or dead values.

An additional optimization can be obtained if we realize that in most
collectors, the *current* call frame can be treated as a special case. A
collection cycle only happens inside one of the collector's runtime utility
functions, and those functions can be written in such a way as to not
require tracing. So it is only the *parent* of the current call frame, and
its ancestors, that need to be traced.

Unfortunately, reaching this theoretical point of maximum efficiency
requires a lot of knowledge about which local variables are in registers at
any given moment, and which values are live. LLVM-based frontends do not, as
rule, have this knowledge. This requires the frontend to be extremely
conservative and pessimistic - it has to treat every SSA value as if it were
in a register, and copy it to a stack location whether it really needs to or
not. Further, because copying collectors can modify object addresses, the
temporary copy on the stack has to be moved back into an SSA value
afterwards.

The end result is that if you look at a disassembled code for a typical
function, the generated code is a mess - it will have dozens of additional
allocas at the beginning of the function, each with their own call to
llvm.gcroot(), and each initialized to a "safe" value. And each call to a
subroutine is surrounded by a bunch of extra stores and loads. There's so
much extra clutter that the code is hard to read.

*Background: Explicit Lifetimes for Roots*
*
*
Another problem with the current LLVM approach for handling stack roots has
to do with the way that you signal to LLVM that a given stack root is going
out of scope - which is by assigning NULL to the pointer. Theoretically,
this extra store shouldn't be necessary, as long as your tracer can handle
the case where different safe points within the same function can have
different stack maps (which is fairly easy to do in the current framework).
That is, if there was a way to communicate to LLVM the region for which a
given stack root was in scope, then the GC strategy could simply remove that
variable from the stack map for all safe points which are outside that
region. No store would be needed.
*
*
*Background: What is a root?*
*
*
Many language runtimes make the assumption that all roots are simple
pointers. For example, in Java, there are really only two kinds of data:
primitive numerical types (floats and ints), and objects. Furthermore, all
objects are type-tagged, so tracing is very simple: If you see a pointer,
and it's non-null, trace it.

However, some languages have a richer set of internal data types, which can
impact garbage collection. Three fairly common examples come to mind:

   - LISP - a pointer field can either hold an actual pointer or a small
   integer. The least significant bit of the pointer is used to determine which
   type it is.
   - Tagged unions - many languages support "tagged unions" or "disjoint
   types". These generally consist of a small struct containing a bitfield and
   a "payload" area, where the payload area is as large as the largest type in
   the union. The bitfield is used to determine what type is currently stored
   in the payload - this affects garbage collection because the collector needs
   to examine the bitfield in order to determine if the payload contains
   pointers or not.
   - Variable-length arrays. These generally contain a 'size' field that
   determines how many elements are in the array. If the array contains
   traceable objects, the collector needs to examine the 'size' field in order
   to determine how many values to trace. (While it is possible that you could
   get around this by initializing unused array elements to null, in practice
   this would slow down many array operations.)

Currently LLVM puts no restriction on the type of data that can be a root -
in other words, a root is "whatever the frontend says is a root".
Furthermore, the llvm.gcroot() intrinsic allows arbitrary metadata to be
associated with each root, making it relatively easy to handle non-tagged
data types - the frontend can use this metadata argument to pass a table of
descriptors which tells the collector how to trace that specific value.

Take the case of tagged unions as an example. We can declare the entire
tagged union struct to be a "root", meaning that what gets passed to the
collector is not the address of the payload area, but the address of the
whole union struct. The collector knows how to examine the bitfield and
trace the payload area because of the metadata that is associated with that
root - it won't assume that the root address is the address of a pointer.

Now, in practice, the only LLVM data types that will ever be roots are
pointers and structs. In the latter case, this will most often be structs
containing pointers, but this won't always be the case (Imagine a tagged
union containing both pointer and non-pointer members, but whose largest
member contains no pointers.)

In general, it would be best if LLVM treated roots opaquely - that is, it
should not attempt to interpret the contents of the root, but should instead
just pass it through to the collector.

*Overall Proposal: Support marking SSA values as roots (an evolutionary
approach)*

My proposal consists of three rather significant changes to LLVM:

   - Allow frontends to mark SSA values - or even portions of SSA values -
   as stack roots.
   - For alloca roots, add a way to communicate to LLVM when a root goes out
   of scope.
   - Transfer the responsibility for copying stack roots to memory, from the
   frontend to the LLVM code generators.

Let me take the last point first.

*Proposal Pt 3: **Copying Stack Roots to Memory*

The LLVM code generators and analysis passes have a much more thorough
knowledge of SSA value lifetimes than frontends do, and therefore could
avoid spilling and reloading of values when it wasn't needed. So the LLVM
libraries should take over the job of creating local allocas for holding SSA
values during a safe point, as well as the job of copying those SSA values
to those allocas.

Of course, generating "optimal" code (as outlined in the previous section)
would require a lot of work to the backends. But we don't need to do that
right away. The first goal should be a "conservative, pessimistic"
implementation that's no better or worse than what we have today (other than
being far easier to use.) It might be possible to do this as some sort of
stand-alone transformation/lowering pass.

This is what I mean by an evolutionary approach - come up with the right
interface between the frontends and LLVM that enables us to gradually move
towards that eventual goal.

*Proposal Pt 1: Marking SSA roots*

In order for the backend to be able to properly handle SSA roots, we need
some way to communicate to LLVM which values are roots. Currently the way
that this is done is via the llvm.gcroot() intrinsic. The intrinsic itself
actually does nothing - rather, the presence of the intrinsic instruction is
used as a "marker" that is recognized by certain LLVM analysis passes.

One approach would be to extend the current scheme to work with SSA values,
possibly using a new intrinsic. This is somewhat problematic because you
can't really have a 'do-nothing' function on SSA values - the result of the
function is always going to be a copy of the value, not the original value
that was passed in.

A different approach would be to some how annotate an LLVM type with
information about it's 'root-ness'. In once sense, this is logical, because
it mirrors what the frontend does - uses the type to determine how the
garbage collector should treat an object.

However, annotating types has some major problems:

   - LLVM normally folds structurally identical types together into a single
   type. However, you might have two types which are structurally identical
   when translated into IR, but which are supposed to be treated differently
   from a GC standpoint. The only way to avoid this to make the annotations
   part of the type's signature - to treat types having different annotations
   as unique.
   - For garbage collection we need at least 1 bit and 1 pointer per type
   annotated. (Actually, it's technically possible that we could do without the
   pointer, which is used to store the metadata, but it would make a frontend
   developer's life considerably harder.) Unfortunately, increasing the size of
   every type by that much would be a huge cost.
   - Pointers in LLVM have an "address space" attribute, from which we can
   steal a few bits. Unfortunately, this has two problems: Not all roots are
   pointers, and a few bits aren't really enough.

Given that, there are a few different options I can think of - none of which
I completely like, but they are better than anything I've come up with so
far.

*Option 1: Make type annotations first-class types in LLVM*

The idea here is that you would have a new kind of type expression in LLVM,
which "wraps" an existing type and adds an annotation to it. I'm going to
randomly pick a character that isn't used yet in the LLVM ASCII
serialization format (#) and craft an example of what this might look like:

   %mytype = type #( { %mytype*, i32 }, i8* %annotation )

So basically this is saying that 'mytype' is an annotated struct - you can
use it just like you would use a struct, but it's a distinct type that will
not be folded with other structs, unless they have an identical annotation.

Multiple annotations can be handled either by nested annotations (one
annotation wrapping another), or by allowing the annotation construct to
take multiple arguments.

You can have structs that are roots embedded inside non-root structs (this
applies to options 2a and 2b as well):

   %mytype = type { int, #( { %mytype*, i32 }, i8* %annotation ) }

The way to interpret this is as follows: If you have an SSA value whose type
is "mytype" then only the second member needs to be copied to memory during
a safe point, and the address passed to the garbage collector is the address
of that member only.

The first advantage of this approach is that it's "pay for what you use" -
only people who use annotations have to pay the memory cost for them.
A second advantage of this approach is that you could use it for other
things besides garbage collection - const, volatile, thread-local, and any
other value attributes you can think of.

The big downside here is that you would have to go and update the LLVM code
base to "unwrap" type expressions - that is, any function that takes an LLVM
type as an argument now has to potentially strip off an arbitrary number of
annotations before it can work with the type. This is a vast amount of work.

*Option 2a: Allow annotations for Structs only*

Chris Lattner's proposal for named struct types has a fortunate side effect,
which is that we can now control whether types are folded or not.
Unfortunately, this only applies to structs, and not pointers, which is the
other type we are interested in. However, we can convert any arbitrary LLVM
into a struct by simply wrapping it in a struct.

So let's say that in addition to allowing structs to be named, we also allow
arbitrary annotations to be added. (In fact, the struct name could be
implemented as one type of annotation, so we're really only adding one extra
field.) We make sure that any struct that is a root has a unique name, and
add the appropriate 'root' annotation to it.

This means that any garbage-collectable pointer type such X* now has to be
transformed into { X* }. This shouldn't affect anything except that now you
have to add an extra 0 to your GEP expressions. Ah well, such is life.

*Option 2b: Annotations on structs via an auxiliary data structure.*
*
*
Instead of adding the ability to annotate struct types, we build a parallel
data structure which maps the name of the struct to its garbage collection
metadata. The upside of this is that it involves the least number of changes
to LLVM, but it has a couple of downsides:

   - You still have to unwrap pointers before using them (via an extra 0 to
   GEP.)
   - This data structure needs to be serialized as part of the module.
   - The information about the struct is no longer all in one place, the
   various backend passes that know about stack roots now have to correlate
   information from two different sources.

*Proposal Pt 2: Marking SSA roots*
*
*
The final part of the proposal is to allow the frontend to tell LLVM when an
alloca variable has gone out of scope. This part is easy - just add another
intrinsic, similar to llvm.gcroot, which is used as a marker.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110706/07d78242/attachment.html>