[LLVMdev] Questions on LLVM and binary translation

Thu Feb 23 09:10:59 PST 2006

Hi,

I'm currently investigating LLVM to see whether it can be used in 
dynamic binary translation. My goal is to translate different "source" 
machine code into the "target" machine code during runtime, e.g., MIPS 
-> x86. LLVM has a well-defined intermediate representation to separate 
source and target machine code (source -> LLVM Bytecode -> target), and 
is a quit extensible and adaptable framework. So I consider LLVM as a 
good choice for building advanced binary manipulation tools. But I have 
several questions in regard to fit LLVM into dynamic binary translation 
use cases:

1. The current JIT implementation assumes the bytecode file is fully 
generated and should be read and parsed by [BytecodeFileReader] before 
JIT (right? ). Can current LLVM be extended to support to parse 
bytecodes just-in-time, that is, parse block of bytecode whenever 
available? I think it may be a useful and interesting feature for LLVM.

2. Why are the current codegen passes per-function-a-time? I'd rather do 
it per-BB-a-time because some BBs in the function may not be executed at 
all. Is there any difficulty to do codegen per-BB-a-time?

Thank you for the attention, and any suggestions and comments on 
applying LLVM on dynamic binary translation are mostly welcomed.

- Daniel Bao