[LLVMdev] GC heap implementation

Talin viridia at gmail.com
Mon Feb 11 22:51:47 PST 2008


I'd like to share some code with anyone who is interested.

A while back I started working on the problem of implementing a garbage 
collector using the LLVM primitives. While I managed to accomplish a 
fair bit in this direction, I eventually realized that it was 
distracting me from my main goal, which is implementing a compiler for a 
strongly-typed language of my own design. In other words, I was writing 
a GC implementation because I needed one, not necessarily because I 
wanted to write one. I decided to push the GC project to the back burner 
and focus on my main goal, on the theory that while that was happening 
the GC problem might be worked on by someone else.

Although my GC code was initially intended to support my runtime, I've 
lately been using it in the compiler. I realized that in my case, I 
couldn't keep on using llvm::BumpPtrAllocator because I couldn't make 
guarantees about the lifetime of AST nodes in my compiler, and if I used 
a malloc-based allocator I would have to do something like reference 
counting.

Instead, I took one component of my GC - the general heap implementation 
- and built a trivial mark & sweep collector on top of it, which seems 
to be working very well for my purpose.

This GC heap was to be the foundation for my collector. It's loosely 
inspired by the popular dlmalloc implementation (although it doesn't 
directly take any code from it), but it also supports a number of 
features useful for implementing a collector, such as the ability to 
efficiently walk the heap and free blocks based on a predicate callback. 
It also reserves space in the 8-byte allocation header for an opaque 
type tag pointer (or debug string if you don't need tags) as well as 
room for 2 mark bits. It supports aligned allocations, which is useful 
for doing work with SIMD instructions. It supports compile-time 
configurable guard bytes, small block bins, and other things you would 
expect in a modern allocator. The heap is thread-safe, and uses only 
pthread primitives for locking. It allows the heap to be non-contiguous, 
so if you have other libraries that are calling regular libc malloc you 
can still coexist with them. The code is written in fairly-well 
commented ANSI C and includes a number of unit tests.

In any case, this code is currently lying fallow, so if anyone is 
interested in working with it I will be glad to donate it.

-- Talin




More information about the llvm-dev mailing list