[LLVMdev] help decompiling x86 ASM to LLVM IR

James Courtier-Dutton james.dutton at gmail.com
Tue Mar 12 09:55:58 PDT 2013


On 12 March 2013 16:39, Óscar Fuentes <ofv at wanadoo.es> wrote:
>
> This is not possible, except for specific cases.
>
> Consider this code:
>
> long foo(long *p) {
>   ++p;
>   return *p;
> }
>
> The X86 machine code would do something like
>
> add %eax, 4
>
> for `++p', but for x86_64 it would be
>
> add %rax, 8
>
> But you can't know that without looking at the original C code.
>
> And that's the most simple case.
>
> The gist is that the assembly code does not contain enough semantic
> information.

I already know how to handle the case you describe.
I am not converting ASM to LLVM IR without doing quite a lot of analysis first.
1) I can already tell if a register is refering to a pointer or an
integer based on how it is used. Does it get de-referenced or not? So,
I would know that "p" is a pointer.
2) From the binary, I would know if it was for 32bit or 64bit.
3) I could then use (1) and (2) to know if "add %rax, 8" is "p = p +
1" (64bit long), or "p = p + 2(32bit long)"

So, I think your "It is not possible" is a bit too black and white.




More information about the llvm-dev mailing list