[LLVMdev] Newbie

Sat Apr 5 10:30:26 PDT 2008

On Tue, Apr 1, 2008 at 9:49 AM, Vania Joloboff <vania at liama.ia.ac.cn> wrote:

 Hello,
>
> We are a research project in joint french-chinese laboratory. We are
> considering using
>  LLVM in our project but we'd like to have some additional info before we
> dive in.
> Since we are new kids on the block, please bear with us...
>
> We are interested in using LLVM for emulation of real hardware. What we
> have as
>  input is the binary code of the program to run. Today we emulate each
> instruction
> behavior sequentially, which has pros and cons. We want to build a faster
> simulator,
> and an idea is to decompile the binary code into an LLVM representation,
> then compile
> it to the simulation host and run it. Hopefully it would be faster because
> perhaps we
> may use one LLVM instruction for several machine instructions, and we can
> benefit
> from the real host stack and the real registers instead of a simulated
> stack
> and simulated registers.
>
> So we have several questions:
>
> 1. Do you have an opinion on the feasibility of the project ?
>            Do you know if it has been done before.
>
Using LLVM for dynamic binary translation is definitely feasible, last year
I was working on llvm-qemu during Google Summer of Code 2007 which in fact
does binary translation with LLVM. It is a modified version of qemu which
uses the LLVM JIT for optimization and code generation. Currently it
translates from ARM machine code to LLVM IR (at basic block level) and via
the LLVM JIT to x86 machine code. All source architectures supported by qemu
(x86, x86-64, ARM, SPARC, PowerPC, MIPS, m68k) can be translated to LLVM IR
this way (adding support for one of these architectures only requires minor
changes to llvm-qemu).

The end result was that llvm-qemu was running about half the speed of
regular qemu on the synthetic benchmark nbench (using a hotspot-like
approach: interpretation of blocks with few executions and JITing of blocks
with high execution counts). However, there is still potential for
improvement, one being an efficient implementation of direct block chaining
(in certain cases a block can directly jump to its successor instead of
falling back to the dispatcher, this is currently implemented with calls
instead of jmps, which should be possible to implement with jmps now, after
the recent work on tail call optimizations).  Direct block chaining is a
very useful optimization, on the nbench test case enabling direct block
chaining for regular qemu leads to a 100% speed increase. Another promising
improvement would be the capability to build "super"-blocks from a set of
connected basic blocks, resembling a "hot path". This work is partially
finished and, once complete, should yield a significant performance
improvement since a "super"-block offers a lot more optimization potential
compared to a single basic block. Nevertheless, it is unlikely that
llvm-qemu will ever be much faster than regular qemu (by replacing its code
generator completely, which it currently does), which is due to the fact
that regular qemu has a very lightweight code generator (it basically only
copies blocks of memory and performs some patching to them and only does
static register allocation) which generates reasonably good code, with a
very low overhead for compilation time. In contrast the LLVM JIT generates
really high quality code (in fact the JIT and the static compiler share the
same code generator), but at a higher price in terms of compilation time.
Ideally the current code generator of qemu would coexist with the LLVM JIT
in llvm-qemu, allowing for different levels of code quality/compilation time
depending on the execution frequency of a particular block.

I guess in your situation, the chances are much higher that you will see a
significant performance increase since you apparently don't do any dynamic
binary translation yet, especially if you decide to use your existing
interpreter in combination with the LLVM JIT in a hotspot-like manner.

An important question is how you perform the translation from your source
architecture to LLVM IR, for llvm-qemu I could benefit from the fact that
qemu translates from source machine code to an intermediate representation
which has its instructions implemented in C, thus I could use llvm-gcc to
compile the instructions to equivalent LLVM IR and did not have to worry
about the actual translation from machine code to qemu IR. Going directly
from machine code to LLVM IR certainly requires more effort.

Which architectures are you interested in particularly?

> 3 We want to generate directly the in-memory IR and dynamicall call the
> LLVM code
>     generator on a chunk of code that has been decompiled, not a complete
> program.
> *         *Is this possible ? Is it worthwile in terms of performance ?
>
Yes, that's perfectly possible and that's what llvm-qemu does too
(translation is performed at basic block level).  As Patrick already pointed
out, static recompilation is not really feasible in most cases.

If you're interested, you can find llvm-qemu at
http://code.google.com/p/llvm-qemu/, the Wiki contains a page which lists
the progress of the project (including some numbers regarding performance).

Greetings,

Tilmann Scheller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20080405/2ca101cb/attachment.html>