[LLVMdev] Extending GC infrastructure for roots in SSA values

Sat Dec 29 17:54:56 PST 2012

First of all, thanks for looking into this! As you've no doubt discovered,
I'm one of the people who has talked a lot about this issue in the past,
and have been frustrated with the lack of progress in this area.

I completely agree with your point about wanting to be able to attach GC
metadata to a type (rather than attaching it to a value, as is done now).
In the past, there have been two objections to this approach: first, the
overhead that would be added to the Pointer type - the vast majority of
LLVM users don't want to have to pay an extra 4-8 bytes per Pointer type.
And second, that all of the optimization passes would have to be updated so
as to not do illegal transformations on a GC type.

A different approach would be to create a new kind of derived type that
associates metadata with an existing type. This "AnnotatedType" would be
essentially a tuple (type, metadata), and would be constant-folded just
like other types are. Just like the existing GC intrinsics today, there
would be some way for a post-optimization pass to get access to all of the
stack variables an examine the annotations on each to determine how to
construct the appropriate static data structures.

This approach has both a number of advantages and a number of challenges.
The first advantage is that it means that LLVM users who aren't interested
in GC would pay nothing. A second advantage is that this could also be used
to wrap types that are not pointers. One use case I have is being able to
handle a discriminated union type which sometimes holds a pointer and
sometimes doesn't. The existing intrinsics allow this use case (within the
limitations that you point out) - the way it works is that the entire
struct is considered a "root" and is passed through to the GC plugin, which
generates code to look at the discriminator field and decide whether to
trace it or not. However, I'm also aware of the fact that I seem to be the
only one who is interested in this particular case, so I won't strongly
object if your solution doesn't handle it, as long as the existing
intrinsics continue to work.

The challenge of this approach is that a lot of backend code will need to
unwrap the annotated type in order to operate upon it, and it would be all
too easy to discard the associated metadata as part of this process.

On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at gmail.com> wrote:

> I'm working on an LLVM backend for Idris, a garbage-collected pure
> functional programming language, and have experienced some frustration
> that LLVM's GC support, specifically with regard to mapping roots,
> operates only on allocas. This entails a lot of otherwise unnecessary
> stack allocation (especially in a pure language, where in-place
> mutation is rare) and imposes limitations on what optimizations can be
> applied. Other LLVM users have used elaborate workarounds to this,
> such as Rust's use of address spaces and, I believe, GHC's specialized
> calling convention. I'm interested in extending LLVM to support GC
> roots in regular SSA values, but, that being a significant change,
> it's clear that some discussion is in order before diving in if I want
> to get such a patch merged.
> This topic has been discussed on multiple previous occasions, and in
> each case nothing seems to have come of it, though interest appears to
> be significant. In particular, concerns with how such infrastructure
> could be made to abide by the invariants of arbitrary GC algorithms
> seem to have stayed hands. It's not clear to me why that poses a
> problem--if the property of being a GC root is correctly propagated
> through all manipulations of a pointer, and that information tracked
> through register allocation and made available to the GC metadata
> printer, won't the the resulting system have no more limitations or
> constraints than the current one? A copying collector would, having a
> complete list of root locations, still be able to rewrite them; a
> mark-and-sweep collector would still be able to find everything in
> need of marking; and so on.
> If my understanding above is correct, then perhaps the challenge lies
> in correctly propagating the marking of a pointer in an SSA value as a
> root through transforms. The pattern of propagation desired seems
> identical to that of type information, so perhaps it would be best to
> make the marking of a pointer as a GC root an attribute of its type,
> much the way address spaces already work? Recall again Rust's approach
> here, where the behavior of address space information through
> transforms is exactly what is relied upon.
> It's easy to imagine a GC root flag on pointer types, but one still
> needs to attach metadata to enable tagless GC as supported by the
> existing infrastructure. Rust simply encodes this information into the
> address space number; a similar approach could be envisioned with a
> 'GC type ID' number that could be used by the GC metadata printer to
> look up detailed information in e.g. module-level metadata, but this
> is a bit awkward; it would be nice to have an interface at least as
> convenient as the current intrinsic is. If the metadata is uniqued so
> as not to break type equality and uniquing, would it be viable to have
> the GCd pointer type itself refer to a metadata node?
> Finally, is there anything else that needs consideration before attempting
> this?
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121229/0e16de22/attachment.html>