[LLVMdev] Extending GC infrastructure for roots in SSA values

Fri Dec 28 13:09:37 PST 2012

I'm working on an LLVM backend for Idris, a garbage-collected pure
functional programming language, and have experienced some frustration
that LLVM's GC support, specifically with regard to mapping roots,
operates only on allocas. This entails a lot of otherwise unnecessary
stack allocation (especially in a pure language, where in-place
mutation is rare) and imposes limitations on what optimizations can be
applied. Other LLVM users have used elaborate workarounds to this,
such as Rust's use of address spaces and, I believe, GHC's specialized
calling convention. I'm interested in extending LLVM to support GC
roots in regular SSA values, but, that being a significant change,
it's clear that some discussion is in order before diving in if I want
to get such a patch merged.
This topic has been discussed on multiple previous occasions, and in
each case nothing seems to have come of it, though interest appears to
be significant. In particular, concerns with how such infrastructure
could be made to abide by the invariants of arbitrary GC algorithms
seem to have stayed hands. It's not clear to me why that poses a
problem--if the property of being a GC root is correctly propagated
through all manipulations of a pointer, and that information tracked
through register allocation and made available to the GC metadata
printer, won't the the resulting system have no more limitations or
constraints than the current one? A copying collector would, having a
complete list of root locations, still be able to rewrite them; a
mark-and-sweep collector would still be able to find everything in
need of marking; and so on.
If my understanding above is correct, then perhaps the challenge lies
in correctly propagating the marking of a pointer in an SSA value as a
root through transforms. The pattern of propagation desired seems
identical to that of type information, so perhaps it would be best to
make the marking of a pointer as a GC root an attribute of its type,
much the way address spaces already work? Recall again Rust's approach
here, where the behavior of address space information through
transforms is exactly what is relied upon.
It's easy to imagine a GC root flag on pointer types, but one still
needs to attach metadata to enable tagless GC as supported by the
existing infrastructure. Rust simply encodes this information into the
address space number; a similar approach could be envisioned with a
'GC type ID' number that could be used by the GC metadata printer to
look up detailed information in e.g. module-level metadata, but this
is a bit awkward; it would be nice to have an interface at least as
convenient as the current intrinsic is. If the metadata is uniqued so
as not to break type equality and uniquing, would it be viable to have
the GCd pointer type itself refer to a metadata node?
Finally, is there anything else that needs consideration before attempting this?