[LLVMdev] llvm.gcroot suggestion

Talin viridia at gmail.com
Sat Mar 5 09:42:42 PST 2011


On Mon, Feb 21, 2011 at 1:50 AM, nicolas geoffray <
nicolas.geoffray at gmail.com> wrote:

> Hi Talin,
>
> On Fri, Feb 18, 2011 at 5:50 PM, Talin <viridia at gmail.com> wrote:
>>
>>
>> In the current scheme, the way you tell LLVM that a root is no longer
>> needed is by assigning NULL to it. However, that assumes that all roots are
>> pointers, which is not true in my world - a root can be a struct containing
>> pointers inside of it. (In my current frontend, a non-pointer root is
>> indicated by passing a non-NULL metadata argument to llvm.gcroot, which
>> contains information about which fields in the struct are roots. This is
>> especially important in the case of tagged unions, where the garbage
>> collector may have to examine the union tag field in order to determine if
>> the pointer field is indeed a pointer - passing the pointer alone would be
>> insufficient to determine this.)
>>
>
> For a tagged union, I guess you are currently using the second argument of
> llvm.gcroot to provde the information? I guess keeping an intrinsic for this
> kind of code is the best way to go.
>
>
>> Putting GC roots in a different address space works OK for me, as long as
>> I can have SSA values that are structs that have pointers embedded in them
>> that are in this different address space. In other words, if I have an SSA
>> value that is a struct containing pointers which are roots, I need for the
>> garbage collector to see the entire struct, not just the pointers.
>>
>
> That's entirely fine with a different address space. The roots given by the
> LLVM GC pass should contain the location of these embedded pointers.
>
>
>>
>> What I'm primarily asking for is to have the LLVM code generator
>> automatically spill roots from SSA values to memory during a sync point, and
>> reload them afterward,
>>
>
>
> I don't think that's even needed: long term, LLVM should return the
> location of all roots for a given sync point (typically method call). By all
> roots, I mean register roots and stack roots. The frontend should then be
> responsible for updating those roots.
>
>
>> instead of my frontend having to generate code to do this. As I mentioned,
>> the current scheme results in the frontend having to generate very
>> inefficient IR because of the need to be conservative about root liveness.
>>
>
> Agree.
>
>
>> The frontend can't know anything about the optimization passes that LLVM
>> will perform on the function.
>>
>
> Sure. And I think the way to go is to remove the llvm.gcroot intrinsic (and
> the hackish way it currently works: right now, because we take the address
> of the alloca, the LLVM optimiziers won't try to optimize an alloca that may
> escape through the llvm.gcroot function call). By having an address space
> for GC roots, optimizers don't need to care about anything. After the
> optimizers and the register allocator, a final LLVM pass should compute the
> root lists of all sync points.
>
> Nicolas
>

So I've been thinking about your proposal, that of using a special address
space to indicate garbage collection roots instead of intrinsics. I want to
point out some of the downsides of this approach:

1) The biggest drawback that I see is that it still requires frontends to
signal that a root is no longer being used by assigning NULL to the pointer.
This turns out to be hard to do in some cases, for example:

   while (true) {
      String s = someFunc();
      if (s->equals("foo")) {
        break;
      } else {
        // call some function with s
      }
   }

In the above example, where would you put the code to zero out the root 's'?
In this case, the answer is, after the body of the loop. Now, normally one
would zero out roots at the end of the block in which the variable is
declared, but in the case of this while loop, it's wasted effort to do that
since the root is just going to get assigned again at the top of the loop.
However, sometimes we never make it to the end of the block, in this case
due to a break statement, and so we have to either null out 's'  either just
before or just after the break.

This example is relatively simple - if we start getting into scenarios
involving switch statements, try/catch, and so on, I can easily construct
complex examples that would make your head spin.

Worse, there are cases where there are multiple code paths exiting from a
block, and you end up having to generate the code to null out a given root
on each of those paths. Remember what I said about frontends having to
generate inefficient code to handle llvm.gcroot? This is one of the
reasons.

To address this, we need a better way of telling LLVM that a given variable
is no longer a root. Of course, in the end it doesn't matter what the
lifetime of the variable is, the only thing that matters is the state of the
variable at each safe point. If there was a way to tell LLVM 'stop tracking
this variable as a root', and then let it worry about whether the value in
the variable is live or not, the generated code could be much more
efficient.

Of course, frontends would still have to deal with multiple exits from a
block, but they can afford, I think, to be somewhat more lax about it. For
example, assume in the example above that we insert a call to
llvm-gcroot-end (or whatever we want to call it) after the while loop. The
compiler knows in this case that all paths originating from the definition
of s must pass through that point. Further, LLVM knows that there's only one
safe point in the while loop, which is in the 'else' block. Thus the only
time the GCStrategy ever 'sees' the variable 's' is at that one point, which
means that there's no need to zero it out. In other words, LLVM knows that
the range over which the variable is live is smaller than the range over
which it is declared a root.

2) As I mentioned, my language supports tagged unions and other "value"
types. Another example is a tuple type, such as (String, String). Such types
are never allocated on the heap by themselves, because they don't have the
object header structure that holds the type information needed by the
garbage collector. Instead, these values can live in SSA variables, or in
allocas, or they can be embedded inside larger types which do live on the
heap.

The way I currently handle such objects is by passing the trace table for
the type as the metadata argument to llvm.gcroot(). A NULL metadata argument
means that the root is a simple pointer, a non-NULL argument means that the
root is a struct.

How do we signal that a struct is no longer a root? Currently I do so by
zeroing out the entire structure, but again that's wasteful. It would be
better to simply tell LLVM that the struct is no longer a root.

3) I've been following the discussions on llvm-dev about the use of the
address-space property of pointers to signal different kinds of memory pools
for things like shared address spaces. If we try to use that same variable
to indicate garbage collection, now we have to multiplex both meanings onto
the same field. We can't just dedicate one special ID for the garbage
collected heap, because there could be multiple such heaps. As you add
additional orthogonal meanings to the address-space field, you end up with a
combinatorial explosion of possible values for it.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110305/31f38bf4/attachment.html>


More information about the llvm-dev mailing list