[LLVMdev] help decompiling x86 ASM to LLVM IR
Joshua Cranmer 🐧
Pidgeot18 at gmail.com
Tue Mar 12 10:10:34 PDT 2013
On 3/12/2013 11:55 AM, James Courtier-Dutton wrote:
> I already know how to handle the case you describe.
> I am not converting ASM to LLVM IR without doing quite a lot of analysis first.
> 1) I can already tell if a register is refering to a pointer or an
> integer based on how it is used. Does it get de-referenced or not? So,
> I would know that "p" is a pointer.
What if the variable is being loaded out of a memory location, and the
current use increments it by four but never dereferences it, while some
other location derefences it?
What if (in x86-64 code) the variable clears the low three bits of the
pointer to use it as scratchpad space for a few tracking bits? In 32-bit
code, that's unsafe, since you can only guarantee two unused bits.
What if you have a pointer variable in the middle of the struct, so you
need to shift the data offset of a pointer-relative address to get the
correct variable?
What if you have the equivalent assembly code for this C code:
union {
struct {
int *a;
int b;
};
struct {
int c;
int d;
};
} x;
...
switch () {
case A: return &x->b;
case B: return &x->d;
}
After optimization, cases A and B reduce to the same assembly in 32-bit
code but not in 64-bit code.
How would you propose to detect and fix these cases?
> 2) From the binary, I would know if it was for 32bit or 64bit.
> 3) I could then use (1) and (2) to know if "add %rax, 8" is "p = p +
> 1" (64bit long), or "p = p + 2(32bit long)"
>
> So, I think your "It is not possible" is a bit too black and white.
No, it's AI-hard, as evidenced that porting programs from 32-bit to
64-bit at the source-code level is nontrivial for large projects with
lots of developers. And you only have less information at assembly level.
--
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist
More information about the llvm-dev
mailing list