[LLVMdev] help decompiling x86 ASM to LLVM IR

James Courtier-Dutton james.dutton at gmail.com
Tue Mar 12 09:20:26 PDT 2013


Hi,

I am looking to decompile x86 ASM to LLVM IR.
The original C is this:
int test61 ( unsigned value ) {
        int ret;
        if (value < 1)
                ret = 0x40;
        else
                ret = 0x61;
        return ret;
}

It compiles with GCC -O2 to (rather cleverly removing any branches):
0000000000000000 <test61>:
   0:   83 ff 01                cmp    $0x1,%edi
   3:   19 c0                   sbb    %eax,%eax
   5:   83 e0 df                and    $0xffffffdf,%eax
   8:   83 c0 61                add    $0x61,%eax
   b:   c3                      retq

How would I represent the SBB instruction in LLVM IR?
Would I have to first convert the ASM to something like:
   0000000000000000 <test61>:
   0:                   cmp    $0x1,%edi        Block A
   1:                   jb     4:               Block A
   2:                   mov    0x61,%eax        Block B
   3:                   jmp    5:               Block B
   4:                   mov    0x40,%eax        Block C
   5:                   retq                    Block D  (Due to join point)

...before I could convert it to LLVM IR ?
I.e. Re-write it in such a way as to not need the SBB instruction.

The aim is to be able to then recompile it to maybe a different target.
The aim is to go from binary -> LLVM IR -> binary for cases where the
C source code it not available or lost.

I.e. binary available for x86 32 bit.  Re-target it to ARM or x86-64bit.
The LLVM IR should be target agnostic, but would permit the
re-targetting task without having to build AST and structure as a C or
C++ source code program.

Any comments?

James



More information about the llvm-dev mailing list