[LLVMdev] Using LLVM with a new runtime environment

Thu Apr 9 18:00:17 PDT 2009

Hello,

I'd like to use LLVM as a code generator for a runtime environment  
(thread scheduler + malloc + GC) I'm in the final stages of developing.

Here is a brief overview: The system maintains a constant number of  
execution resources (either system threads or CPUs), which I call  
"executors".  Threads themselves are very lightweight, consisting only  
of seven words of information, plus scheduling overhead.  A scheduler  
based on lock-free structures maps threads onto executors.  The  
scheduler communicates with threads using mailboxes, and relies on  
voluntary context switches.  The system provides a copying garbage  
collector (similar to Cheng/Blelloch, but based on lock-free data  
structures).  The assumption is that threads will use the GC allocator  
to allocate their frames.

My specific needs are as follows:

- Threads have a mailbox which contains the following information: the  
ID of the executor running them, the current GC status, any signals  
that have been sent to the thread, their GC allocation pointer, the  
limit of their current GC allocation block, a pointer to their GC  
write log, and the number of log entries available.
- Safepoints are necessary, both for the scheduler system to work  
right, and for the GC.  The thread specific data I just mentioned are  
assumed to be overwritten at each safepoint.  In between safepoints,  
however, it can be assumed to be non-volatile.
- All pointers to GCed objects are represented as pairs of pointers.   
Which pointer is in use depends on the GC status word.
- If the GC status indicates that GC is underway, all writes to GC  
memory must be logged before reaching the next safe point.  Multiple  
writes to the same location only need to be logged once before  
reaching the safe point, though.
- GC memory is allocated by advancing the GC allocation pointer.  If  
the allocation block runs out, then a function is called to obtain  
another one.
- Heap objects need to have specifically formatted headers, and I need  
to generate specifically formatted type signatures.

Obviously, there is potential for optimization in several places.  If  
you're not space-sensitive, you can duplicate code to deal with the  
double-pointer/write barriers.  You can also memoize the data from the  
mailboxes.  Lastly, write-logging can be delayed until a safe-point is  
about to be executed, which might eliminate duplicate logs as well.

Having read over the LLVM documentation, it seems that most of this  
should be fairly easy.  My question is, are there any potential snags  
or pitfalls, or will any of this require a substantial amount of  
work?  Secondly, how many of the aforementioned optimizations would  
LLVM do of its own accord?

Thanks.

-- 
Eric McCorkle
Computer Science Ph.D Student
ericmcc at cs.umass.edu