[LLVMdev] help decompiling x86 ASM to LLVM IR
Óscar Fuentes
ofv at wanadoo.es
Tue Mar 12 09:39:54 PDT 2013
James Courtier-Dutton <james.dutton at gmail.com> writes:
> I am looking to decompile x86 ASM to LLVM IR.
> The original C is this:
> int test61 ( unsigned value ) {
> int ret;
> if (value < 1)
> ret = 0x40;
> else
> ret = 0x61;
> return ret;
> }
>
> It compiles with GCC -O2 to (rather cleverly removing any branches):
> 0000000000000000 <test61>:
> 0: 83 ff 01 cmp $0x1,%edi
> 3: 19 c0 sbb %eax,%eax
> 5: 83 e0 df and $0xffffffdf,%eax
> 8: 83 c0 61 add $0x61,%eax
> b: c3 retq
>
> How would I represent the SBB instruction in LLVM IR?
> Would I have to first convert the ASM to something like:
> 0000000000000000 <test61>:
> 0: cmp $0x1,%edi Block A
> 1: jb 4: Block A
> 2: mov 0x61,%eax Block B
> 3: jmp 5: Block B
> 4: mov 0x40,%eax Block C
> 5: retq Block D (Due to join point)
>
> ...before I could convert it to LLVM IR ?
> I.e. Re-write it in such a way as to not need the SBB instruction.
>
> The aim is to be able to then recompile it to maybe a different target.
> The aim is to go from binary -> LLVM IR -> binary for cases where the
> C source code it not available or lost.
>
> I.e. binary available for x86 32 bit. Re-target it to ARM or x86-64bit.
> The LLVM IR should be target agnostic, but would permit the
> re-targetting task without having to build AST and structure as a C or
> C++ source code program.
>
> Any comments?
This is not possible, except for specific cases.
Consider this code:
long foo(long *p) {
++p;
return *p;
}
The X86 machine code would do something like
add %eax, 4
for `++p', but for x86_64 it would be
add %rax, 8
But you can't know that without looking at the original C code.
And that's the most simple case.
The gist is that the assembly code does not contain enough semantic
information.
More information about the llvm-dev
mailing list