[LLVMdev] LLVM and managed languages

Fri Jul 1 17:33:09 PDT 2011

On Fri, Jul 1, 2011 at 4:41 PM, Andrew Trick <atrick at apple.com> wrote:

> Hi Talin,
>
> I have some questions below. If these topics have already been discussed in
> earlier threads, kindly point me there. I'm aware of your GC proposal, but
> the rest is new to me.
>
> On Jul 1, 2011, at 11:05 AM, Talin wrote:
>
> *Garbage collection is still way too difficult*. The biggest problem is
> the inability to track SSA values - it requires the frontend to generate
> very inefficient and error-prone code, manually spilling SSA values to the
> stack around nearly every function call. I've written proposals for
> improving the situation, which I won't repeat here - but realistically
> there's no way that I am ever going to have time learn to enough about
> LLVM's code generation passes to implement something like this myself.
>
> Another area which I'd like to see supported is stack walking - that is,
> starting from the current call frame, iterate through all of the call frames
> above it. Currently the only way to do this is via platform-specific
> assembly language - I'd think it would be relatively simple to make an
> intrinsic that does this. (Note that the current stack intrinsics are
> unusable, because they are (a) unreliable, and (b) inefficient. Taking an
> integer index of a stack frame as a parameter is not the right way to do
> it.)
>
> I have to say, if there's any single issue that could make me give up on
> LLVM and switch over to using something like a JVM, it would be this -
> because as far as I can tell, LLVM's garbage collection features are in
> exactly the same state they were in when I started working with LLVM four
> years ago.
>
>
> JVM implementers have done a lot of work in their respective compiler
> backend to support moving collectors. I'm not aware of any efficient
> solution short of making the backend aware of object pointer types. i.e.
> Machine operands need to carry these types.
>
> So, I have a working implementation of a moving collector that uses the
LLVM intrinsics. This consists of three components:

   - My frontend code that tells LLVM which data values contain traceable
   roots.
   - My custom GCStrategy class which uses information about the roots to
   generate a stack map for every function identifying the relative offset of
   each root.
   - My runtime library which traverses those stack maps.

All of this is pretty standard stuff. What's nice about it is that LLVM
doesn't really need to know any of the details of my garbage collection
algorithm, or even what a root looks like - I'm sure you saw my earlier
discussion of discriminated unions as roots, where the collector needs
access to the discriminator field to determine whether the payload is a
pointer or not. Since the llvm.gcroot intrinsic causes the root to be passed
to the strategy with no assumptions about its internal structure, that gives
me a lot of flexibility.

For having SSA values as roots, the situation is obviously more complex. One
simplifying assumption is that we don't try and trace register values
directly (because a structure may have been 'sliced' into multiple registers
and it would be hard for the GCStrategy to handle that), but rather do what
we do now and say that the collector can only trace objects which live in
memory. Thus, we still need to spill SSA values into memory and reload them
- but that work should be done by LLVM and not by the frontend, because LLVM
has far more detailed knowledge of value lifetimes.

The other hard bit is how to 'mark' arbitrary values as roots. Right now
values are marked using the llvm.gcroot intrinsics, but that only works on
allocas. My current thinking is that we add a second bit to the "Struct"
type (the first bit being isPacked) called "isRoot". If you want to declare
a pointer as a root, then wrap it in a struct having that bit set - so you
have to add an extra 0 into your GEP to get the value out, but that's
trivial. Since a struct can contain any other data type, it makes it easy to
create roots having any desired structure. Again, these get communicated
from the frontend to the strategy pass without interpretation by LLVM,
except for one stipulation, which is that the structure isn't 'sliced', that
the offset given to the strategy pass points to a complete copy of the
struct.

> For stack walking, what do you need in addition to DWARF unwind? Or is
> generating correct unwind tables the problem? I'm not sure what stack
> intrinsics you mean... returnaddress and frameaddress? They don't always
> work if you provide the necessary target options? It almost sounds like you
> want more control than libunwind provides? Or do you want more efficiency?
>
> Here's what my stack walker for x86 platforms looks like now:

struct CallFrame {
  CallFrame * prevFrame;
  void * returnAddr;
};

void GC_traceStack(tart_object * traceAction) {
  CallFrame * framePtr;
  #if _MSC_VER
    #if SIZEOF_VOID_PTR == 4
      __asm {
        mov framePtr, ebp
      }
    #else
      __asm {
        mov framePtr, rbp
      }
    #endif
  #else
    #if SIZEOF_VOID_PTR == 4
      __asm__("movl %%ebp, %0" :"=r"(framePtr));
    #else
      __asm__("movq %%rbp, %0" :"=r"(framePtr));
    #endif
  #endif

  while (framePtr != NULL) {
    void * returnAddr = framePtr->returnAddr;
    framePtr = framePtr->prevFrame;
    TraceDescriptor * tdesc = GC_lookupStackFrameDesc(returnAddr);
    if (tdesc != NULL) {
      TraceAction_traceDescriptors(traceAction, (void *)framePtr, tdesc);
    }
  }
}

Basically it gets the bp register as a pointer, and treats it as the head of
a linked list. It iterates through the list nodes until it finds NULL. For
each node, it finds the return address (which is stored next to the node
pointer), and looks that up in a map which returns the stack frame
descriptor for that function. As you can see this is extremely simple and
fast.

If I were designing a set of llvm intrinsics to replicate this, I would have
three functions: llvm.callframe.current() simply returns the value of the bp
register; llvm.callframe.next() takes the address of the previous stack
frame and returns its parent; and llvm.callframe.returnaddress() takes the
address of a call frame and returns the associated return address. These
three intrinsics should (I hope) be implementable on any platform that LLVM
targets, and would (I would think) each lower into a single instruction.

This would avoid me having to have a different set of inline assembly for
each different processor family that LLVM supports.

> *Light-weight coroutines* would be a "nice to have", as would better *concurrency
> primitives*.
>
>
> This could be interesting. Can you explain in more detail or just point me
> to some examples?
>
> I'm thinking of the kind of things that Go or IBM's X10 language does, or
possibly Python's generators. Each language is of course going to create its
own model of concurrency, but that model is going to be created from
lower-level primitives that are either provided by LLVM (things like
stack-swapping and thread-local variables) or provided by the various
operating system libraries (thread creation and semaphores and such.)

> There's been a lot of discussion about divide-by-zero errors and other *non-declared
> exceptions*. Having this available would be a great help.
>
> What is a non-declared exception (no throw statement in the source code?),
> and do you think something is missing from LLVM IR? I don't see any
> difficulty expressing Java/C# exceptions in LLVM IR. Do you want to build
> high-level language semantics into LLVM IR to avoid duplicating effort
> across Java/C#/Tart frontends? Or do you want better optimization/codegen in
> these cases (language-friendly backend)?
>
> There's been several threads on this lately, and I am by no means an expert
in this area, but I do understand that the basic problem of jumping out from
the middle of a basic block is a hard one to solve - but I would like to see
it solved. I can work with whatever solution you guys come up with.

> Keep in mind that adding higher level IR constructs only inhibits
> optimization. Java bytecode bakes language semantics into opcodes as a
> compression technique. It's not a form that is amenable to optimization.
>
> Maybe what you really want is a managed language toolkit for use in
> frontends such as yours?
>
> Backing up now to the overall topic, which of these general themes fits
> your experience:
> 1) You want more efficient support for implementing advanced runtime
> features for a given target
> 2) You want more ABI abstraction for easier porting across a range of
> targets
>
> I think #2 is more interesting to me. It's not hard to do #1 for a given
platform, as there are libraries a-plenty to do most of this stuff.

Think of it this way: The name of the project is LLVM: "Low-Level Virtual
Machine". I realize that's a bit of a misnomer, in that LLVM isn't primarily
about executing code, but more about preparing code for execution. (I
suppose we could call it LLISEE - Low Level Instruction Set and Execution
Environment - but that's a mouthful. You could also call it LLVVM - Low
Level "Virtual" Virtual Machine, in that the VM doesn't actually exist). But
the "VM" part of the name seems to imply that it's not just a compiler, but
a framework for execution and for building environments in which execution
happens.

-Andy

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110701/74ea83a5/attachment.html>