[LLVMdev] Extending GC infrastructure for roots in SSA values

Sun Dec 30 16:28:38 PST 2012

On Sat, Dec 29, 2012 at 5:54 PM, Talin <viridia at gmail.com> wrote:
> First of all, thanks for looking into this! As you've no doubt discovered,
> I'm one of the people who has talked a lot about this issue in the past, and
> have been frustrated with the lack of progress in this area.

Yeah, I spent some time digging through the archives. Frankly, I'm
surprised something that would be clearly useful for so many people
hasn't had someone else step up before, but I'm happy to be that
person.

> I completely agree with your point about wanting to be able to attach GC
> metadata to a type (rather than attaching it to a value, as is done now). In
> the past, there have been two objections to this approach: first, the
> overhead that would be added to the Pointer type - the vast majority of LLVM
> users don't want to have to pay an extra 4-8 bytes per Pointer type. And
> second, that all of the optimization passes would have to be updated so as
> to not do illegal transformations on a GC type.

Is the added memory cost a realistic concern, bear in mind that types
are uniqued?

Your comment about optimization passes evokes what I've read in past
discussions, and I don't understand it any better now. Can you
describe some examples of illegal transformations that would occur if
the current optimization passes were left unchanged?

> A different approach would be to create a new kind of derived type that
> associates metadata with an existing type. This "AnnotatedType" would be
> essentially a tuple (type, metadata), and would be constant-folded just like
> other types are. Just like the existing GC intrinsics today, there would be
> some way for a post-optimization pass to get access to all of the stack
> variables an examine the annotations on each to determine how to construct
> the appropriate static data structures.
>
> This approach has both a number of advantages and a number of challenges.
> The first advantage is that it means that LLVM users who aren't interested
> in GC would pay nothing. A second advantage is that this could also be used
> to wrap types that are not pointers. One use case I have is being able to
> handle a discriminated union type which sometimes holds a pointer and
> sometimes doesn't. The existing intrinsics allow this use case (within the
> limitations that you point out) - the way it works is that the entire struct
> is considered a "root" and is passed through to the GC plugin, which
> generates code to look at the discriminator field and decide whether to
> trace it or not. However, I'm also aware of the fact that I seem to be the
> only one who is interested in this particular case, so I won't strongly
> object if your solution doesn't handle it, as long as the existing
> intrinsics continue to work.
>
> The challenge of this approach is that a lot of backend code will need to
> unwrap the annotated type in order to operate upon it, and it would be all
> too easy to discard the associated metadata as part of this process.

I imagine this could prove useful for even more than GC, as it
introduces what one might think of as type-level, as opposed to
instruction- or value-level, metadata. Being both simpler and more
general, this seems like a much better idea than my proposal, and I'd
be happy to run with it if it checks out. In fact, I would have
eventually ran into exactly the same problem with discriminated unions
that you describe; Idris, like any other modern statically-typed
functional language, has algebraic datatypes after all. Getting
optimization passes to treat AnnotatedTypes the same as their
contained type would be necessary for this to pay off completely; can
anyone comment on what, if any, difficulties might be involved there?

> On Fri, Dec 28, 2012 at 1:09 PM, Benjamin Saunders <ralith at gmail.com> wrote:
>>
>> I'm working on an LLVM backend for Idris, a garbage-collected pure
>> functional programming language, and have experienced some frustration
>> that LLVM's GC support, specifically with regard to mapping roots,
>> operates only on allocas. This entails a lot of otherwise unnecessary
>> stack allocation (especially in a pure language, where in-place
>> mutation is rare) and imposes limitations on what optimizations can be
>> applied. Other LLVM users have used elaborate workarounds to this,
>> such as Rust's use of address spaces and, I believe, GHC's specialized
>> calling convention. I'm interested in extending LLVM to support GC
>> roots in regular SSA values, but, that being a significant change,
>> it's clear that some discussion is in order before diving in if I want
>> to get such a patch merged.
>> This topic has been discussed on multiple previous occasions, and in
>> each case nothing seems to have come of it, though interest appears to
>> be significant. In particular, concerns with how such infrastructure
>> could be made to abide by the invariants of arbitrary GC algorithms
>> seem to have stayed hands. It's not clear to me why that poses a
>> problem--if the property of being a GC root is correctly propagated
>> through all manipulations of a pointer, and that information tracked
>> through register allocation and made available to the GC metadata
>> printer, won't the the resulting system have no more limitations or
>> constraints than the current one? A copying collector would, having a
>> complete list of root locations, still be able to rewrite them; a
>> mark-and-sweep collector would still be able to find everything in
>> need of marking; and so on.
>> If my understanding above is correct, then perhaps the challenge lies
>> in correctly propagating the marking of a pointer in an SSA value as a
>> root through transforms. The pattern of propagation desired seems
>> identical to that of type information, so perhaps it would be best to
>> make the marking of a pointer as a GC root an attribute of its type,
>> much the way address spaces already work? Recall again Rust's approach
>> here, where the behavior of address space information through
>> transforms is exactly what is relied upon.
>> It's easy to imagine a GC root flag on pointer types, but one still
>> needs to attach metadata to enable tagless GC as supported by the
>> existing infrastructure. Rust simply encodes this information into the
>> address space number; a similar approach could be envisioned with a
>> 'GC type ID' number that could be used by the GC metadata printer to
>> look up detailed information in e.g. module-level metadata, but this
>> is a bit awkward; it would be nice to have an interface at least as
>> convenient as the current intrinsic is. If the metadata is uniqued so
>> as not to break type equality and uniquing, would it be viable to have
>> the GCd pointer type itself refer to a metadata node?
>> Finally, is there anything else that needs consideration before attempting
>> this?
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
> --
> -- Talin
>