[LLVMdev] Newbie

Tue Apr 1 19:52:54 PDT 2008

Vania Joloboff wrote:
> Hello,
>
> We are a research project in joint french-chinese laboratory. We are 
> considering using
>  LLVM in our project but we'd like to have some additional info before 
> we dive in.
> Since we are new kids on the block, please bear with us...
>
> We are interested in using LLVM for emulation of real hardware. What 
> we have as
>  input is the binary code of the program to run. Today we emulate each 
> instruction
> behavior sequentially, which has pros and cons. We want to build a 
> faster simulator,
> and an idea is to decompile the binary code into an LLVM 
> representation, then compile
> it to the simulation host and run it. Hopefully it would be faster 
> because perhaps we
> may use one LLVM instruction for several machine instructions, and we 
> can benefit
> from the real host stack and the real registers instead of a simulated 
> stack
> and simulated registers.
Very cool.  I'm probably not the person best qualified to answer your 
questions, but since no one else has answered them yet, I'll take a shot.
>
> So we have several questions:
>
> 1. Do you have an opinion on the feasibility of the project ?
>            Do you know if it has been done before.
There was a Google Summer of Code (GSoC) project last year where someone 
started the work of modifying Qemu (a simulator) to use LLVM for JIT 
compilation for faster simulation.  I don't know how well it worked, but 
it's very similar to what you want to do.  I'd say it's quite feasible, 
and, in fact, LLVM should make it easier with its JIT libraries.
>
> 2. There is an in-memory representation for LLVM. Where shall we look 
> in the
> documentation about it to understand how to generate it properly ?
The LLVM Programmers Manual 
(http://llvm.org/docs/ProgrammersManual.html) might be a good place to 
start; it describes the basic classes used for the LLVM in-memory IR in 
the latter half of the document.  The doxygen documentation is also 
surprisingly useful (http://llvm.org/docs/ProgrammersManual.html) to 
describe the details of the LLVM programming APIs.

The in-memory representation is very easy to generate if you're writing 
your program in C++.  Basically, there are C++ classes for each type of 
object in the LLVM IR.  To create the in-memory IR, you simply create a 
new object of the correct class.  For example, to create a new function, 
you simply create a new Function object (i.e. Function * f = new 
Function (...)).
>
> 3 We want to generate directly the in-memory IR and dynamicall call 
> the LLVM code
>     generator on a chunk of code that has been decompiled, not a 
> complete program.
> /         /Is this possible ? Is it worthwile in terms of performance ?
This should be possible using the JIT libraries included with LLVM.  I 
have not used these extensively, but I'm sure someone else on the list 
has and would be happy to answer any specific questions you may have.

Whether it will be worthwhile in performance, I am not sure, but since 
you are currently doing emulation, I'd think that dynamic binary 
translation with LLVM would be much faster.

-- John T.

>
>
> Sincerely,
> -- Vania
>
> ================================================
> Vania JOLOBOFF
> LIAMA Sino French Laboratory
> 95 Zhongguancun East Road
> Beijing 100080, China
> Tel +86 10 8261 4528     http://liama.ia.ac.cn/
> vania at liama.ia.ac.cn  or vania.joloboff at inria.fr
>