[LLVMdev] help decompiling x86 ASM to LLVM IR

Tue Mar 12 09:45:37 PDT 2013

On 3/12/2013 11:20 AM, James Courtier-Dutton wrote:
> It compiles with GCC -O2 to (rather cleverly removing any branches):
> 0000000000000000 <test61>:
>     0:   83 ff 01                cmp    $0x1,%edi
>     3:   19 c0                   sbb    %eax,%eax
>     5:   83 e0 df                and    $0xffffffdf,%eax
>     8:   83 c0 61                add    $0x61,%eax
>     b:   c3                      retq
>
> How would I represent the SBB instruction in LLVM IR?

If you're decompiling an assembly language into IR, it is best to treat 
the CFLAGS register as just another register which is manipulated as a 
side effect of instructions and letting a dead-code elimination pass 
eliminate extraneous uses. A rough equivalent for llvm IR in this could 
would be
%cf = icmp lt i32 1, %edi
%eax2 = sub i32 %eax, %eax
%1 = zext i1 %cf to i32
%eax3 = sub i32 %eax2, %1
%eax4 = and i32 0xffffffdf, %eax3
%eax5 = add i32 0x61, %eax4

> The aim is to be able to then recompile it to maybe a different target.
> The aim is to go from binary -> LLVM IR -> binary for cases where the
> C source code it not available or lost.
I know qemu can use LLVM IR as an intermediate form for optimizing 
emulation; you might want to look into their source code. Or actually 
just outright use qemu.

>
> I.e. binary available for x86 32 bit.  Re-target it to ARM or x86-64bit.
> The LLVM IR should be target agnostic, but would permit the
> re-targetting task without having to build AST and structure as a C or
> C++ source code program.
Retargetting binaries for different hardware sounds like a losing 
proposition to me, especially if you're trying to retarget x86 binary 
code to x86-64: problems here include code acting as if sizeof(void*) = 
4 instead of the correct value of 8. The only safe way to do this is to 
effectively emulate the original target machine... which is more or less 
what qemu does.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist