[LLVMdev] help decompiling x86 ASM to LLVM IR
Joshua Cranmer 🐧
Pidgeot18 at gmail.com
Tue Mar 12 09:45:37 PDT 2013
On 3/12/2013 11:20 AM, James Courtier-Dutton wrote:
> It compiles with GCC -O2 to (rather cleverly removing any branches):
> 0000000000000000 <test61>:
> 0: 83 ff 01 cmp $0x1,%edi
> 3: 19 c0 sbb %eax,%eax
> 5: 83 e0 df and $0xffffffdf,%eax
> 8: 83 c0 61 add $0x61,%eax
> b: c3 retq
>
> How would I represent the SBB instruction in LLVM IR?
If you're decompiling an assembly language into IR, it is best to treat
the CFLAGS register as just another register which is manipulated as a
side effect of instructions and letting a dead-code elimination pass
eliminate extraneous uses. A rough equivalent for llvm IR in this could
would be
%cf = icmp lt i32 1, %edi
%eax2 = sub i32 %eax, %eax
%1 = zext i1 %cf to i32
%eax3 = sub i32 %eax2, %1
%eax4 = and i32 0xffffffdf, %eax3
%eax5 = add i32 0x61, %eax4
> The aim is to be able to then recompile it to maybe a different target.
> The aim is to go from binary -> LLVM IR -> binary for cases where the
> C source code it not available or lost.
I know qemu can use LLVM IR as an intermediate form for optimizing
emulation; you might want to look into their source code. Or actually
just outright use qemu.
>
> I.e. binary available for x86 32 bit. Re-target it to ARM or x86-64bit.
> The LLVM IR should be target agnostic, but would permit the
> re-targetting task without having to build AST and structure as a C or
> C++ source code program.
Retargetting binaries for different hardware sounds like a losing
proposition to me, especially if you're trying to retarget x86 binary
code to x86-64: problems here include code acting as if sizeof(void*) =
4 instead of the correct value of 8. The only safe way to do this is to
effectively emulate the original target machine... which is more or less
what qemu does.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
More information about the llvm-dev
mailing list