[LLVMdev] llvm.gcroot suggestion

Mon Mar 7 04:08:12 PST 2011

Hi Talin,

On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote:
>
>
> So I've been thinking about your proposal, that of using a special address
> space to indicate garbage collection roots instead of intrinsics.

Great!

>
> To address this, we need a better way of telling LLVM that a given variable
> is no longer a root.
>

Live variable analysis is already in LLVM and for me that's enough to know
whether a given variable is no longer a root. Note that each safe point has
its own set of root locations, and these locations all contain live
variables. Dead variables may still be in register or stack, but the GC will
not visit them.

> 2) As I mentioned, my language supports tagged unions and other "value"
> types. Another example is a tuple type, such as (String, String). Such types
> are never allocated on the heap by themselves, because they don't have the
> object header structure that holds the type information needed by the
> garbage collector. Instead, these values can live in SSA variables, or in
> allocas, or they can be embedded inside larger types which do live on the
> heap.
>

If you know, at compile-time, whether you are dealing with a struct or a
heap, what prevents you from emitting code that won't need such tagged
unions in the IR. Same for structs: if they contain pointers to heap
objects, those will be in that special address space.

3) I've been following the discussions on llvm-dev about the use of the
> address-space property of pointers to signal different kinds of memory pools
> for things like shared address spaces. If we try to use that same variable
> to indicate garbage collection, now we have to multiplex both meanings onto
> the same field. We can't just dedicate one special ID for the garbage
> collected heap, because there could be multiple such heaps. As you add
> additional orthogonal meanings to the address-space field, you end up with a
> combinatorial explosion of possible values for it.
>
>
I think there exist already some convention between an ID and some codegen.
Having one additional seems fine to me, even if you need to play with bits
in case you need different IDs for a single pointer.

I'm also fine with the intrinsic way of declaring a GC root. But I think it
is cumbersome, and error-prone in the presence of optimizers that may try to
move away that intrinsic (I remember similar issues with the current EH
intrinsics).

Nicolas

> --
> -- Talin
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/4b2ba122/attachment.html>