[LLVMdev] llvm.gcroot suggestion

Talin viridia at gmail.com
Mon Mar 7 09:46:20 PST 2011


On Mon, Mar 7, 2011 at 9:35 AM, Talin <viridia at gmail.com> wrote:

> On Mon, Mar 7, 2011 at 4:08 AM, nicolas geoffray <
> nicolas.geoffray at gmail.com> wrote:
>
>> Hi Talin,
>>
>> On Sat, Mar 5, 2011 at 6:42 PM, Talin <viridia at gmail.com> wrote:
>>>
>>>
>>> So I've been thinking about your proposal, that of using a special
>>> address space to indicate garbage collection roots instead of intrinsics.
>>
>>
>>  Great!
>>
>>
>>>
>>> To address this, we need a better way of telling LLVM that a given
>>> variable is no longer a root.
>>>
>>
>> Live variable analysis is already in LLVM and for me that's enough to know
>> whether a given variable is no longer a root. Note that each safe point has
>> its own set of root locations, and these locations all contain live
>> variables. Dead variables may still be in register or stack, but the GC will
>> not visit them.
>>
>>
>>> 2) As I mentioned, my language supports tagged unions and other "value"
>>> types. Another example is a tuple type, such as (String, String). Such types
>>> are never allocated on the heap by themselves, because they don't have the
>>> object header structure that holds the type information needed by the
>>> garbage collector. Instead, these values can live in SSA variables, or in
>>> allocas, or they can be embedded inside larger types which do live on the
>>> heap.
>>>
>>
>> If you know, at compile-time, whether you are dealing with a struct or a
>> heap, what prevents you from emitting code that won't need such tagged
>> unions in the IR. Same for structs: if they contain pointers to heap
>> objects, those will be in that special address space.
>>
>
> I'm not sure what you mean by this.
>
> Take for example a union of a String (which is a pointer) and a float. The
> union is either { i1; String * } or { i1; float }. The garbage collector
> needs to see that i1 in order to know whether the second field of the struct
> is a pointer - if it attempted to dereference the pointer when the field
> actually contains a float, the program would crash. The metadata argument
> that I pass to llvm.gcroot informs the garbage collector about the structure
> of the union.
>

Sorry, I left a part out. The way that my garbage collector works currently
is that the collector gets a pointer to the enture union struct, not just
the pointer field within the union. In other words, the entire union struct
is considered a "root".

In fact, there might not even be a pointer in the struct. You see, because
LLVM doesn't directly support unions, I have to simulate that support by
casting pointers. That is, for each different type contained in the union, I
have a different struct type, and when I want to extract data from the union
I cast the pointer to the appropriate type and then use GEP to get the data
out. However, when allocating storage for the union, I have to use the
largest data type, which might not be a pointer.

For example, suppose I have a type "String or (float, float, float)" - that
is, a union of a string and a 3-tuple of floats. Most of the time what LLVM
will see is { i1; { float; float; float; } } because that's bigger than {
i1; String* }. LLVM won't even know there's a pointer in there, except
during those brief times when I'm accessing the pointer field. So tagging
the pointer in a different address space won't help at all here.


>> 3) I've been following the discussions on llvm-dev about the use of the
>>> address-space property of pointers to signal different kinds of memory pools
>>> for things like shared address spaces. If we try to use that same variable
>>> to indicate garbage collection, now we have to multiplex both meanings onto
>>> the same field. We can't just dedicate one special ID for the garbage
>>> collected heap, because there could be multiple such heaps. As you add
>>> additional orthogonal meanings to the address-space field, you end up with a
>>> combinatorial explosion of possible values for it.
>>>
>>>
>> I think there exist already some convention between an ID and some
>> codegen. Having one additional seems fine to me, even if you need to play
>> with bits in case you need different IDs for a single pointer.
>>
>> I'm also fine with the intrinsic way of declaring a GC root. But I think
>> it is cumbersome, and error-prone in the presence of optimizers that may try
>> to move away that intrinsic (I remember similar issues with the current EH
>> intrinsics).
>>
>> Nicolas
>>
>>
>>> --
>>> -- Talin
>>>
>>
>>
>
>
> --
> -- Talin
>



-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110307/b38365b6/attachment.html>


More information about the llvm-dev mailing list