[LLVMdev] Interfacing llvm with a precise, relocating GC

Mon Oct 28 20:51:16 PDT 2013

Sanjoy: This document which I wrote several years ago may be of some use to
you:

Building a Stack Crawler in
LLVM<https://docs.google.com/document/d/1-ws0KYo47S0CgqpwkjfWDBJ8wFhW_0UYKxPIJ0TyKrQ/edit?usp=sharing&authkey=COD8_LcL>

I have successfully implemented a copying collector using LLVM. I did not
implement support for interior pointers, however I have a number of ideas
on how to approach it. The language that I was implementing was similar to
C# in that classes were divided into "reference" types and "value" types. A
pointer to a reference type on the heap always pointed to the start of an
allocation, whereas a pointer to a value type was always an interior
pointer. (In other words, a value type could never exist by itself in the
heap, it had to be embedded within some reference type). This means that
the compiler could always know whether a pointer was an internal pointer or
not. Thus, pointers to reference types were just machine-level pointers,
whereas pointers to value types consisted of a pointer+offset, with the
pointer part pointing to the start of a heap allocation. Combined with the
fact that pointers to value types were relatively rare, this allowed
internal pointers with a minimum of overhead.

Now, having said all that, I feel compelled to give a few warnings. Part of
the reason I abandoned this project was because of limitations in LLVM's
garbage collection intrinsics, which I have written about extensively on
this list. The current llvm.gcroot strategy requires the frontend to be
very complex, generate highly inefficient code, and that code is mostly
unoptimizable since LLVM's optimizers generally won't touch a value that
has been marked as a GC root.

Worse, the support for GC in the LLVM community is fairly low - the garbage
collection intrinsics in LLVM have not been updated or improved in the 7
years of my following the project, and there's been very little discussion
of GC on the mailing list (I do a search for the word "collect" in the LLVM
archives about once a month, which is how this thread came to my
attention.) Most of the people working on/with LLVM are working with non-GC
languages, or with languages that have simple enough memory models (e.g.
"everything is an atom") that the existing intrinsics are sufficient. There
are also a few people who have gotten around the problems by defining their
own stack frames instead of using the LLVM intrinsics.

There have been numerous proposals over the years for better GC intrinsics,
but nothing has come out of these discussions so far. There's a good reason
for this: improving the GC support would require a major commitment, since
all of the backend code generators and optimizers would potentially be
affected.

My current favorite GC proposal involves annotating types - that is, to
define a new kind of derived type that is essentially a 2-tuple consisting
of a base type + GC metadata (the second argument to llvm.gcroot). This
means that "root-ness" would be a property of a type rather than a value,
which means that the rootness could automatically be propagated to
intermediate values or SSA values through optimization without the frontend
having to do a lot of spilling and reloading of values. (Plus having the
ability to associate metadata with types might be useful for other things
besides GC.)

On Thu, Oct 24, 2013 at 2:32 PM, Sanjoy Das <sanjoy at azulsystems.com> wrote:

> Hello llvm-dev!
>
> My colleages and I are currently evaluating llvm's suitability as a
> JIT compiler interfacing with a precise, relocating garbage collector.
> While we couldn't find code or writeups that deal with the issues
> specific to this design goal, it is entirely possible that we may have
> missed something; we would appreciate references to relevant code or
> writeups that people on this list may be aware of.
>
> As an example, one issue that makes this non-trivial is that llvm (as
> far as we know) is free to manufacture pointers to locations _inside_
> objects, something referred to as a "derived pointer" in some places.
> Since these pointers need to be updated in sync with the objects they
> point into, a relocating GC needs to be aware of them; and the runtime
> needs to be able to read off which registers and stack slots hold such
> pointers at every safepoint.  We've looked into llvm's existing GC
> support, and the mechanism it provides does not seem to help in this
> use case.
>
> This email is deliberately terse, but we are more than happy to get
> into details about the approaches we've considered as the discussion
> progresses.  Pointers to existing work related to this or similar
> issues is especially welcome.
>
> Thanks!
>
> -- Sanjoy
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131028/6d4966f4/attachment.html>