[LLVMdev] Getting started with GC

Tom Brown tdbrown at uiuc.edu
Fri Dec 3 04:55:15 PST 2004


On Fri, Oct 29, 2004 at 02:19:53AM -0500, Chris Lattner wrote:
> On Thu, 28 Oct 2004, Tom Brown wrote:
> > Do you have any intentions for the use of Meta in SemiSpace?
> I'm not sure what you mean by 'meta', but if you mean metadata, yes.
I meant the argument in %llvm.gcroot(<ty>** %ptrloc, <ty2>* %metadata)
All examples I've seen set it to null and we have no plans for it.
 

> Given this, it seems like your GC
> needs the front-end to provide the following:
> 
> 1. A way of identifying/iterating all of the global references.  It seems
>    reasonable for the GC to expose a function like "void gc_add_static_root(void**)"
For now calling llvm.gcroot in main() for every global pointer should suffice.


> 2. A way of getting the size of an object.  Initially you can use a header
>    word on the object, but eventually, it would be nice if the front-end
>    runtime library would provide a function, say
>    "unsigned gc_fe_get_object_size(void *object)", which could be called
>    by the GC implementation to get this information.
I imagine a collector moves and deletes blocks of memory as returned by
malloc (AKA llvm_gc_allocate). The collector does not care what the
application does with the block as long as it can find all the pointers
going into and out of the block. Thus the collector should keep track of
the size of each block independently from the frontend. Brian created
a GC internal object that keeps metadata for each block. Given an
address anywhere in a block returned by llvm_gc_allocate the object
returns size, lowest_address_in block, offsets_to_pointers_in_block (see
below).


> 3. A way of iterating over the (language specific) GC map information for
>    a heap object.  In particular, you need a way to identify the
>    offsets of all of the outgoing pointers from the start of a heap object.
>    I'm not sure what the best way to do this is (or the best way to
>    encode this) but I'm sure you can come up with something. :)  The most
>    naive and inefficient interface (always a good starting place) would be
>    to have the front-end provide a callback:
>      _Bool gc_fe_is_pointer(void *obj, unsigned offset)
>    that the GC can use to probe all of the words in the object to see if
>    they are pointers.  This can obviously be improved. :)
The problem with gc_fe_is_pointer is that the GC should tell the FE
when it moves a block of memory and the FE will need to update its
structure. This would duplicate much of the block meta data logic in
the GC.
I noticed that llvm.gcroot provides a way for the FE to declare
pointers on the stack. We will implement a equivalent function for
pointers in the heap, gc_register_pointer. It will add entries to the
offsets_to_pointers_in_block for the block containing the pointer.

The weakness of this scheme is that after the pointers in a block have
been registered the FE must always use them as pointers. In other
words the type associated with a block should not change in the blocks
lifetime. I can't think of a sane example of why someone would want to
violate this restriction.


> The next step in this is to flush out the interface between the GC and
> the language runtime.  The GC doesn't want to know anything about how the
> FE lays out vtables and GC info
Are vtables created in memory the FE gets from the heap? If we use
gc_register_pointer does the GC need to indirectly access the vtables?


> Note that (by hacking on alloc_loop and other testcases you write), you
> are basically writing a file that would be generated by a garbage
> collected language front-end.
A group working on an OCaml -> LLVM translator showed us their
preliminary work. They are using "malloc <type>" to allocate heap
memory. How difficult would it be to implement "llvm_gc_allocate <type>"
as a call to "llvm_gc_allocate(unsigned %Size)" followed by calls to
gc_register_pointer as needed by <type>? We were oblivious to this
version of malloc until today. Is there a reason why "llvm_gc_allocate
<type>" is not in GCInterface.h beyond the difficulty of representing it
in C?


Thank you for your previous useful reply,
Tom

-- 
28 70 20 71 2C 65 29 61 9C B1 36 3D D4 69 CE 62 4A 22 8B 0E DC 3E
mailto:tdbrown at uiuc.edu
http://thecap.org/




More information about the llvm-dev mailing list