[LLVMdev] Advice - llvm as binary to binary translator ?

Tue Jun 24 02:28:06 PDT 2008

You may also want to take a look at valgrind. It is capable of translating 
IA-32/64, PPC-32/64, etc.. to its own SSA-style IR. And then it has backends 
from the IR to the very same architectures. You can also build a backend to 
LLVM and let it further optimize the generated code (although valgrind 
already has its own optimizers).
Building such translators is not easy business. I can tell you that by 
experience.. Depending on what you want to achieve, I would reuse something 
already existent.

Nuno

----- Original Message ----- 
From: "Harry Metcalfe" <H.S.Metcalfe at sussex.ac.uk>
To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>
Sent: Tuesday, June 24, 2008 9:28 AM
Subject: Re: [LLVMdev] Advice - llvm as binary to binary translator ?

> Hi Eric,
>
> I'm currently writing an IA-32 to LLVMIR translator. I'm only mid way
> through, but I can certainly say that there have been more difficulties
> than I anticipated when I began!
>
> I think that it is a reasonable approach, perhaps especially in your
> case, since you have an emulator already. Automatic static translation
> is equivalent to the halting problem for IA-32 code, though perhaps it
> wouldn't be for yours (what architecture are you using?). A dynamic
> phase is therefore necessary for me -- if it is for you too, you'll have
> a leg up.
>
> Self-modifying code is both hideous and unusual, and very difficult to
> deal with. I'm leaving it to one side.
>
> General thoughts: are you sure that LLVMIR is suitable? You may be
> better off with a lower-level representation. At least in my case, LLVM
> enforces a level of structure that doesn't exist in machine code. That's
> something you'll also probably have to deal with.
>
> Its type system also hampers the modification of translated code, so
> it's advantageous to ensure that you won't need to change any code once
> translated. This is of particular importance when you're trying to
> figure out the bounds of an array, and things like that: a change to the
> size of an array is a change of its type, which means it's much easier
> just to get the size of the array right in the first place. I'm
> currently in the process of altering my code so that a lot more analysis
> takes place before translation even begins!
>
> Finally, how will you deal with memory accesses and aliasing? This is
> certainly the thorniest problem, and its the one my dynamic phase exists
> to solve.
>
> Do email me off-list if you like -- it sounds like we're pursuing
> similar lines of inquiry!
>
> Harry
>
>
>
> On Sat, 2008-06-21 at 21:53 -0400, Erik Buck wrote:
>> First, is there a way to search the archives for this list ?  I
>> apologize in advance if I have stepped on a FAQ.
>>
>> My goal is to execute legacy binary machine code from a very old one
>> of a kind computer on a variety of modern computers.   I already wrote
>> an emulator for the legacy machine that executes the old machine
>> code.  However, my emulator is just an interpreter and therefore has
>> some limitations:
>>
>> - The emulator spends a lot of time in an executive loop that fetches
>> legacy instructions, decodes them, and jumps to appropriate C
>> functions that emulate each legacy instruction.  The executive loop
>> also has to handle emulated interrupts, support single-step debugging,
>> etc.
>>
>> - The emulator is compiled and run on only a few modern hardware/
>> operating system combinations.  The emulator is fairly portable, but
>> extensive optimizations on some platforms restrict capabilities on
>> other platforms.
>>
>> - The emulator executes the legacy machine code unmodified which is
>> good, but that means opportunities for optimization are lost.  The
>> legacy machine code is full of dead code, jumps to jumps, redundant
>> sub-expressions, unnecessary memory accesses, etc.  Back in the old
>> days, compilers really didn't optimize at all.  They generated
>> horrible code that was sometimes hand modified.
>>
>> My idea is to convert my emulator into a translator that emits LLVM IR
>> either directly or via calls to the LLVM library.  I would then
>> execute the result via JIT or native code compilation...
>>
>> Is this a reasonable approach ?
>> Can this approach be used even when the legacy code is self
>> modifying ?  After a code modification, a re-translation and re-JIT
>> would be needed.
>>
>> Are there any general suggestions ?