[LLVMdev] Garbage Collection Project

Jon Harrop jon at ffconsultancy.com
Thu Jun 18 06:04:08 PDT 2009

On Thursday 18 June 2009 11:14:35 Granville Barnett wrote:
> > Firstly, rather than using a single 1 word pointer to represent a
> > reference I
> > chose to use 3 words including a pointer to the type and a pointer to the
> > value (as well as metadata). This allows typed nulls and that addresses
> > an important deficiency found in most other VMs including the CLR. Is
> > Scarcity able to handle such references or does its implementation of
> > stack frames require references to be a single word?
> Three words sounds pretty expensive to me, I can see the use of an extra
> word for typed nulls.

The word is not really "extra" because I just moved the header out of the 
value and into the reference. For example, the three words for an array are a 
pointer to the type, the array length and a pointer to the (C-compatible) 
array data. In addition to providing typed nulls, this has the added 
advantage of simplifying interop.

> If you look at something like the CLR you will see 
> that you have very fast access to an objects type, after all when you have
> a reference to an object what you really have is a reference to an objects'
> object header. From there you can access an objects syncblock (if it has
> one, which is before the obj header) and the objects' method table which
> includes the relevant pointers to the objects' type among other things. It
> simply means you do a few hops, each of which is a constant operation
> anyway.

That is similar to the approach I used, although HLVM provides a pointer 
directly to the type, saving you a single hop.

> Maybe I'm missing something key about the language you are implementing the
> GC for, also is it really necessary to use an extra word for null types?

I would not have considered it had I not seen the damage done to F# by the 
CLR's typeless nulls. You want an unallocated constant for many different 
purposes in F#, such as representing the empty option type None, the empty 
list [] and maybe the unit value (). Unfortunately, using "null" means you 
don't get any type information and that means that none of those values work 
correctly with anything requiring run-time type information such as generic 
printing, serialization and so on. The only alternative is to use an 
allocated value but the allocator and (concurrent) GC are very slow so that 
gives you the functionality you want but only at a grave cost in terms of 

Consequently, they chose to represent the empty list with an allocated value 
(so [] prints correctly) but the option type uses null. Hence printf "%A" 
None prints "<null>". They've also used other tricks like having an internal 
set representation that uses nulls but is wrapped in another representation 
that handles them correctly but only at the cost of an extra level of 

Dr Jon Harrop, Flying Frog Consultancy Ltd.

More information about the llvm-dev mailing list