[LLVMdev] Using LLVM for decompiling.

John Criswell criswell at illinois.edu
Mon May 7 08:31:31 PDT 2012


On 5/7/12 5:47 AM, James Courtier-Dutton wrote:
> Hi,
>
> I am writing a decompiler. I was wondering if some of LLVM could be
> used for a decompiler.
> There are several stages in the decompiler process.
> 1) Take binary and create a higher level representation of it. Like RTL.
> 2) The output is then broken into blocks or nodes, each block ends in
> a CALL, JMP, RET, or 2-way or multiway conditional JMP.

I'm not sure that there's anything that will help you with this step for 
LLVM.  The closest I can think of is Qemu, and I think that uses dynamic 
binary translation (i.e., you have to run the binary program).

> 3) The blocks or nodes are then analyzed for structure in order to
> extract loop information and if...then...else information.

Given that you've completed steps one and two (i.e., you've converted 
the binary instructions to LLVM IR and then discovered basic blocks), 
then yes, LLVM's current analysis passes should help you with this third 
step.  LLVM has passes that normalize loops, identify loops in local 
control-flow graphs, identify dominators/post-dominators, etc.

> 4) Once structure is obtained, data types can be analyzed.

The only thing for LLVM which could help here is a 
type-inference/points-to analysis called DSA.  However, since you're 
reversing everything from binary code, I doubt DSA's type-inference will 
work well, so I don't think it will find many (if any) high-level types 
like structs or arrays of structs.

You might be able to build a more sophisticated analysis yourself, but 
you'll pretty much be on your own.

> 5) Lastly, source code is output in C or C++ or whatever is needed.

LLVM might have facilities for converting LLVM IR to C or C++ code (the 
C backend was recently removed; there might be a C++ backend, but I'm 
not sure).  However, they are primarily designed for systems for which 
LLVM does not provide a native code generator, so the C/C++ code they 
output isn't very readable.

-- John T.

>
> I was wondering if LLVM could help with any of these steps.
> I am looking at doing step (3) better. Can LLVM help in that area?
>
> Kind Regards
>
> James
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list