[LLVMdev] Extending GC infrastructure for roots in SSA values

Sun Dec 30 16:16:51 PST 2012

On Sun, Dec 30, 2012 at 11:02 AM, Talin <viridia at gmail.com> wrote:
> On Sun, Dec 30, 2012 at 2:17 AM, David Chisnall
> <David.Chisnall at cl.cam.ac.uk> wrote:
>>
>> On 30 Dec 2012, at 01:54, Talin wrote:
>>
>> > I completely agree with your point about wanting to be able to attach GC
>> > metadata to a type (rather than attaching it to a value, as is done now). In
>> > the past, there have been two objections to this approach: first, the
>> > overhead that would be added to the Pointer type - the vast majority of LLVM
>> > users don't want to have to pay an extra 4-8 bytes per Pointer type. And
>> > second, that all of the optimization passes would have to be updated so as
>> > to not do illegal transformations on a GC type.
>>
>> There are two other alternatives:
>>
>> - Use address spaces to separate garbage-collected from
>> non-garbage-collected pointers.  There is (was?) a plan to add an address
>> space cast instruction and explicitly disallow bitcasts of pointers between
>> address spaces.  This would mean that you could have one address space for
>> GC roots, one for GC-allocated memory and enforce the casts in your front
>> end.  Optimisations would then not be allowed to change the address space of
>> any pointers, so the GC status would be preserved.  GC-aware allocations may
>> insert explicit address space casts, where appropriate.
>
>
> This works fine for languages like Java where every object has a type field
> that describes how to trace it. However, the existing LLVM intrinsics also
> support the case where the type information is only known statically by the
> compiler instead of at runtime - the metadata argument allows the compiler
> to pass a trace table to the GC plugin. Trying to encode that information
> into a single address space integer would be painful.

Indeed; this sort of tagless GC is exactly what I want to support. An
interesting note is that Rust currently implements exactly the
workaround you describe (see
https://github.com/elliottslaughter/rust-gc-notes ), but, hackiness
aside, this has also caused problems with attempts to support targets
where spaces actually have special meaning (see
http://blog.theincredibleholk.org/blog/2012/12/05/compiling-rust-for-gpus/
). I'm certain they'd be happy to be able to replace that with an
approach such as we're discussing.

>> - Add a new GC'd pointer type, which is an entirely separate type.  This
>> might make sense, as you ideally want GC'd pointers to be treated
>> differently from others (e.g. you may not want pointers to the starts of
>> allocations to be removed)

This seems like a good approach for what I originally described
(though I like Talin's proposal better) in that it's functionally
equivalent, but avoids the unnecessary overhead for non-GCing LLVM
users. Though is it really plausible that anyone would care about an
extra 4-8 bytes per PointerType? I haven't profiled, but I can't
imagine that would be a dramatic increase in the resouce usage of the
tools, or a compiler using them.

>> For languages like OCaml, you also want to be able to do escape analysis
>> on GC'd pointers to get good performance (so you don't bother tracing ones
>> that can't possibly escape).  This ideally requires a pass that will
>> recursively and automatically apply nocapture attributes to arguments.  In
>> functional languages, this ends up being almost all allocations, so you can
>> allocate them either on the stack or on a separate bump-the-pointer
>> allocator and delete them on function return by just resetting the pointer.
>> This means that you would want to be able to have transforms that lowered
>> GC'd pointers to stack or heap pointers.

Is there any particular reason to expect that supporting this would
pose a problem? What might prevent it?

>> In some implementations, GC'd pointers are fat pointers, so they should
>> not be represented as PointerType in the IR or as iPTR in the back end.

I expect that implementation techniques in those cases would be
unaffected by my proposed changes.

>> David
>
>
>
>
> --
> -- Talin