[LLVMdev] Improving GC intrinsics in LLVM

Mon Nov 11 15:28:31 PST 2013

A couple people have asked about this, so I wanted to describe my proposals
for improving LLVM's garbage collection intrinsics.

Currently the llvm.gcroot intrinsic associates a set of GC metadata with a
value, specifically a value produced by an alloca instruction. In the
proposed scheme, GC metadata would instead be associated with a *type*.

The most general approach would be to introduce a new derived type called
AnnotatedType, which is essentially a tuple consisting of (base type, id,
metadata). The base type can be any valid LLVM type, including another
AnnotatedType. The 'id' argument is an integer constant that is used to
determine what kind of annotation this is. The 'metadata' argument is a
const void* pointer, similar to the existing llvm.gcroot() argument, which
can point to any constant data structure.

There would be a fixed set of annotation ids, one of which would be
GC_ROOT. Other kinds of type annotations could be introduced later for
other purposes.

AnnnotatedType instances would be folded/uniqued just like all other
derived types.

LLVM optimizers and code generators would need to be modified to unwrap the
base type whenever an annotated type was encountered. Ideally,
transformations on values and types would preserve the annotations where
possible. For example, when loading a memory location into an SSA value, if
the type of the source memory location has an annotation, then the SSA
value's type would have the same annotation.

In some cases, the presence of the annotation would disable certain kinds
of backend transformations on those values.

Annotations would not be preserved through dereference. Instead, the
contained type must have its own annotation if that is desired.

Most likely, different annotation types would have different rules for
propagating onto intermediate and derived values. (This is why it's a fixed
set rather than open-ended extensible).

The ultimate consumer of the GC_ROOT annotation would be a GCStrategy pass
much like today. The GCStrategy pass would, for each function, be able to
iterate through all local values (either allocas or SSA values) that were
annotated with the GC_ROOT annotation. It could then generate whatever
stack maps or register maps it wanted. It could use the metadata argument
of the annotated type, or ignore it, depending on the specifics of the GC
algorithm used.

Motivations: There are several advantages to this approach. First, it would
be much less work for the frontend developer, since intermediate values
(and particularly SSA values) would automatically inherit the "root" aspect
from their source values. Thus, no more spilling and reloading of SSA
values around safe points (spilling might still happen, but the frontend
developer would not need to be aware of it). Secondly, optimization passes
would now be able to more intelligently handle roots - instead of merely
forbidding any optimizations on roots at all, they would be able to
preserve roots through the transformation. An obvious example of this is
mem2reg - it could convert memory locations which are roots into SSA values
which are roots.

A final note: This is not something that I can personally work on, since I
know very little about optimization and code generation. I'm mainly a
parser / interpreter guy. This is more of a wish :)

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131111/44647458/attachment.html>