[LLVMdev] Why LLVM should NOT have garbage collection intrinsics

Mark Shannon marks at dcs.gla.ac.uk
Sun Mar 1 02:41:39 PST 2009


Gordon Henriksen wrote:
> 
> The "runtime interface" is a historical artifact. LLVM does not impose  
> a runtime library on its users. I wouldn't have a problem deleting all  
> mention of it, since LLVM does not impose a contract on the runtime.
> 
Excellent, I found it somewhat unhelpful!

>> The semantics of llvm.gcroot are vague:
>> "At compile-time, the code generator generates information to allow  
>> the
>> runtime to find the pointer at GC safe points."
>>
>> Vague, ill-specified interfaces are worse than none.
> 
> There's nothing ill-defined about the semantics of gcroot except  
> insofar as GC code generation is pluggable.
> 
Sorry, but "At compile-time, the code generator generates information to 
allow the runtime to find the pointer at GC safe points." does not 
really say anything.
No one could possibly implement this "specification".

Sorry about all my negative comments, but I would like to implement a 
generational collector for llvm, but I cannot do so in a portable way.

So, here is a suggestion:

Call the GC 'intrinsics' something else, "extinsics"?, and provide 
low-level intrinsics so that the GC calls, gcroot, gcread and gcwrite 
can be converted to GC-free LLVM code in a GC-lowering pass.

IR+GC -> | GC Lowering pass | -> IR

Rather than than the current.

IR+GC -> | Backend lowering pass(es) | -> SelectionDAG

Read and write barriers can already be written in llvm-IR.
It is the marking of roots that is the problem.

Given that any new intrinsics/instructions are an additional burden on 
all back-ends, I'm not going to propose particular ones, but it seems 
that they are needed.

By the way, I think that adding a GC pointer type is an unnecessary 
burden on the the back-ends, front-ends really should be able to handle 
this.

The current trio of gcroot, gcread and gcwrite is OK, BUT GC 
implementations should be able to translate them to llvm-IR so that the 
optimisers and back-ends can do their jobs without worrying about GC 
details.

As an aside, I think that debug info can be treated in a similar way:

IR+debug -> | Debug lowering pass | ->  IR

After all both debug and GC require similar things, that is, information 
about the location of stack variables (and possibly, register variables)
and the machine location of points in code (for line numbering or 
gc-safe points).

If intrinsics/instructions to do the above can be implemented then I 
will port my generational, copying collector to LLVM *and* maintain it 
for as long as possible.

Mark.






More information about the llvm-dev mailing list