[LLVMdev] FW: LLVM IR is a compiler IR

Thu Oct 6 14:20:53 PDT 2011

Michael Clagett <mclagett at hotmail.com> writes:

> It is actually the first of your alternatives above that I was hoping
> to achieve and the reason I was thinking this would be valuable is
> twofold.  First, I don't have so much faith in the quality or optimal
> character of my own byte code implementations.  The core 32 opcodes
> tend to be high-level implementations of low-level operations in
> dealing with the elements of the virtual machine

Ok, so these are mildly complex operations.  It makes sense to start
with some kind of machine-generated asm implementation to get optimized
performance.  Don't write the opcode implementations in asm directly.
Write them in a high level language and compile them to native code.
See below.

> The top of data and return stacks are mapped to register EAX and EDI,
> respectively, and the address reg is mapped to ESI.

Does this have to be the case?  See below.

> It was my general feeling that a good SSA-based compilation mechanism
> like that of LLVM could do a better job at maximizing the use of the
> Intel's limited resources than I could.

As an alternative to using the JIT, would it be possible to implement
each opcode in its own interpreter function and compile them statically?
Of course there would be call overhead interpreting each opcode.  Once
you've got that you could apply various techniques such as threading the
interpreter (not multiprocessing, but threading the interpreter as in
http://en.wikipedia.org/wiki/Threaded_code) to eliminate the overhead.

I don't think there's any particular reason to rely on the JIT unless
you want to take it a bit further and optimize a specific sequence of
opcodes seen when interpreting a specific program.  But then we're
getting into the various Futamura transformations.  :)

> Moreover, as long as code remains at the VM instruction level, these
> resources are even more constrained than usual.  EDX needs to be
> preserved to hold the VM instruction pointer.  EAX, ESI and EDI need
> to be preserved for the purposes outlined above.

Why do those registers need to be preserved?  Imagine the interpreter
were written completely in a high level language.  The compiler doesn't
care which register holds a stack pointer, data pointer, etc. as long as
the virtual machine's state is consistent.

> Similar considerations apply to simply reducing from 10 instructions
> to 1 or 2 instructions operations that at the VM level require the
> stack, but that at the intel assembler level would more naturally be
> handled in registers.

Ah, ok, this is interesting.  You want to change the execution model
on-the-fly.  A grad school colleague of mine did something very similar
to this, translating a stack machine into a register machine.  Of course
he's a hardware nut so he designed hardware to do it.  :) Unfortunately,
I don't think he ever published anything on it.

Doing the threading thing mentioned above or the JIT/Dynamo thing
mentioned below can both accomplish this, I think, and without any
register constraints if I'm understanding you correctly.

> Finally, I just have the general feeling that more significant
> compiler optimizations can be effected across sequences of what are my
> vm opcode implementations.  This is a general feeling, but I'm hoping
> fairly well-founded.

Yes, that's true.  See the Futamura reference above.  Given a VM and an
input program, you can in fact generate an optimized executable.  This
is the logical extension of what you're getting at.

For this kind of thing a JIT makes sense.  You might have a look at what
the HP people did with Dynamo.  They got a lot of performance out of
translating PA-RISC to PA-RISC by doing exactly what you describe.

> Hope that explains my thinking better.  Does that change at all your
> view of the benefits that I might achieve from LLVM?

It doesn't change it in the sense that I think LLVM will work well for
this.  JIT speed could be an issue but that will be amortized if the
opcode sequence is executed enough times.

One way to speed up the JIT is to pre-generate a set of instruction
templates for each opcode that get filled in with specific information
available at runtime.  See the papers on DyC for some examples.  I
believe the Dynamo folks also took this route.  This would be quite
extensive work in LLVM but would be very valuable, I think.  Partial
evaluation papers may also be useful to explore.

HTH.

                             -Dave