[LLVMdev] Improving Garbage Collection

Thu Jul 7 13:35:06 PDT 2011

On Thu, Jul 7, 2011 at 10:55 AM, Anderson, Todd A <todd.a.anderson at intel.com<mailto:todd.a.anderson at intel.com>> wrote:
For the past few years, my group in Intel Labs has been working on a project similar to LLVM and C--, and perhaps our experience in handling roots and stack walking could be useful in deciding how LLVM should evolve in the GC area.  Our project is called Pillar (you can see our paper "Pillar: A Parallel Implementation Language" in Languages and Compilers for Parallel Computing 2008 for a general overview).  Pillar is a generic language and runtime designed to support managed languages and parallel languages although most of our work so far has been on the managed side.  We've implemented a JVM and a functional language on top of Pillar so it has evidenced some generality.

In the Pillar language, we have a generalized ref (similar to llvm.gcroot) data type that encompases different kinds of pointers and types into the managed heap.  This could be regular pointers, a couple varieties of interior pointers, or user-defined pointer types.  The underlying compiler then generates per-callsite stack maps.  Those stack maps can indicate when a live ref (local or parameter) is at a known offset from the start of the frame and can also indicate when a live ref is present in a register.

I assume that by interior pointer you mean a pointer to the inside of another object?
[TAA]
Yes.
If we want to enumerate the refs of a call stack, we call a runtime function, prtYoungestActivation, and that gives us back a stack iterator for the sequence of activations.  The iterator contains the value for esp and pointers to the memory locations where ebp, eip, edi, esi, and ebx were last saved to the stack.  (For architectures other than 32-bit x86, the register-specific contents of the activation would of course be defined differently.)  In the eip case, this is saved for every activation as part of the call sequence and so provides you with the ability to determine which call site you are at.  The other registers may have been saved by the current frame or a subsequent frame.  The stack iterator also contains a virtual frame number that basically allows an unwinder to see logical activations instead of physical activations that may combine multiple logical activations via inlining.

That all makes sense.

I would say that your framework is at a somewhat higher level than I would expect LLVM to support - by that, I mean that ideally it should be possible to use the primitives that LLVM supplies to build exactly what you have described. My own use case has slightly different requirements, and as a result I could not adopt your framework as it is - but I see no reason why I should not be able to use the same LLVM primitives to build what I need.

The main difference has to do with the handling of disjoint types in my language. For example, if I declare a variable using the following syntax:
[TAA]
I'm not so sure we couldn't support your case.  If a disjoint type might contain a root then you could define the possible pointer in the struct as a generalized ref having a user-defined type and extraneous data.  That user-defined generalized ref type for you would indicate a disjoint union and the extraneous data could indicate the address of the discriminator in the union.  Whether the data was enregistered or wouldn't matter in that case.  The GC root callback is passed the user-defined type and extraneous data you declare the field with.

I agree that much of this is at a higher level that LLVM should not have to worry about but having such a generalized ref type in the core seems important so you can handle ref params and refs in registers.  You can define it as data structures in the core and implement a library on top.

You can then call another runtime function, prtEnumerateRootsOfActivation, to ask for the roots associated with the given stack iterator's current activation to be enumerated.  To this function, you pass a PrtRseInfo struct that contains a callback that gets invoked once per live ref found and a opaque environment variable passed (unexamined) onto the callback function.  To go to the next activation on the call stack, you then call the runtime's prtNextActivation.  There are routines to determine when you've got to the bottom of the call stack as well.  In addition, there are routines for performing other (non-GC) operations the stack iterator's activation, such as looking up data associated with the SPAN construct (taken from C--) or exception handlers.

In my own case, I don't have an iterator for roots within a single activation, rather I have a static data structure that describes how to find all of the roots within the call frame. One of my design goals is to minimize the number of function calls required to trace a pointer, so the different parts of my code accept whole stack maps as arguments instead of single pointers.

However, the LLVM approach to generating stack maps is flexible enough to handle either case - your GCStrategy plugin gets a list of all stack roots for each safe point, and you can transform that into whatever data structures you need.
[TAA]
They seem interchangeable...if you have a data structure you can just as easily write an iterator.  I'm curious if you see this as performance critical.  In my experience, GC times tend to be completely dominated by walking the live objects and not root set enumeration.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110707/7807bcfa/attachment.html>