[LLVMdev] Reverse engineering for LLVM bit-code

Mon Oct 14 23:07:57 PDT 2013

On 10/14/2013 9:31 PM, Wan, Xiaofei wrote:
> HI,
>
> I am interested in whether LLVM bit-code is ready for a distribution format(stored in software distribution package); is it easy to revert LLVM IR to C/C++ source code like Java byte code? My understanding is that.
> 1. LLVM IR is more like assembly code, so it is not easy for reverse engineering.
IDA and HexRays show that it is extremely possible to reverse engineer 
assembly code (at least that which comes out of a C/C++ compiler) to 
C/C++ code. But even though that's the question you asked, it's not what 
you meant to ask. What makes Java easy to reverse engineer is that it 
retains full structural typing and names of the original program [1]. 
LLVM lacks names for fields of structural types (although it does retain 
struct names and global names), but optimization passes will render all 
SSA names completely illegible, and they often appear to destroy 
structural typing a fair amount too.

> 2. If it is easy for reverse engineering, does it mean it is not suitable for distribution format? Otherwise code obfuscation in IR level must be added.

If you are super-paranoid about reverse-engineering, replace all names of functions with garbage names and all types with equivalent i8 arrays. The resulting IR will pretty much be exactly as informative about the original source code as the resulting assembly will be.

[1] Due to a version control bug, I ended up losing the source code to my C++ project while retaining the resulting library. I found this much easier to decompile than a project I once set myself of decompiling obfuscated Java bytecode (where the only obfuscation that provided a meaningful barrier to comprehension was name obfuscation).

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist