[LLVMdev] help decompiling x86 ASM to LLVM IR

Tue Mar 12 10:17:07 PDT 2013

James Courtier-Dutton <james.dutton at gmail.com> writes:

> I already know how to handle the case you describe.
> I am not converting ASM to LLVM IR without doing quite a lot of analysis first.
> 1) I can already tell if a register is refering to a pointer or an
> integer based on how it is used. Does it get de-referenced or not? So,
> I would know that "p" is a pointer.
> 2) From the binary, I would know if it was for 32bit or 64bit.
> 3) I could then use (1) and (2) to know if "add %rax, 8" is "p = p +
> 1" (64bit long), or "p = p + 2(32bit long)"
>
> So, I think your "It is not possible" is a bit too black and white.

There is no amount of automated analysis that makes possible
"translating" arbitrary binary code from one architecture to another.

Your above stated rules would fail for my example. This code:

int foo(int *p) {
   ++p;
   return *p;
}

compiled in x86 (Linux or Windows) would generate the very same binary
code than

long foo(long *p) {
   ++p;
   return *p;
}

but those functions generate different code in x86_64-linux, where `int'
is 32 bits and `long' 64 bits. In the general case, it is unfeasible to
decide if `p' is a pointer to `int' or `long' on x86.

There are lots and lots of examples of that kind. Other type of problems
are translating ABI-related code, reflecting external data structures...