[LLVMdev] help decompiling x86 ASM to LLVM IR

Joshua Cranmer 🐧 Pidgeot18 at gmail.com
Tue Mar 12 10:10:34 PDT 2013


On 3/12/2013 11:55 AM, James Courtier-Dutton wrote:
> I already know how to handle the case you describe.
> I am not converting ASM to LLVM IR without doing quite a lot of analysis first.
> 1) I can already tell if a register is refering to a pointer or an
> integer based on how it is used. Does it get de-referenced or not? So,
> I would know that "p" is a pointer.
What if the variable is being loaded out of a memory location, and the 
current use increments it by four but never dereferences it, while some 
other location derefences it?

What if (in x86-64 code) the variable clears the low three bits of the 
pointer to use it as scratchpad space for a few tracking bits? In 32-bit 
code, that's unsafe, since you can only guarantee two unused bits.

What if you have a pointer variable in the middle of the struct, so you 
need to shift the data offset of a pointer-relative address to get the 
correct variable?

What if you have the equivalent assembly code for this C code:
union {
   struct {
     int *a;
     int b;
   };
   struct {
     int c;
     int d;
   };
} x;

...
switch () {
  case A: return &x->b;
  case B: return &x->d;
}

After optimization, cases A and B reduce to the same assembly in 32-bit 
code but not in 64-bit code.

How would you propose to detect and fix these cases?

> 2) From the binary, I would know if it was for 32bit or 64bit.
> 3) I could then use (1) and (2) to know if "add %rax, 8" is "p = p +
> 1" (64bit long), or "p = p + 2(32bit long)"
>
> So, I think your "It is not possible" is a bit too black and white.

No, it's AI-hard, as evidenced that porting programs from 32-bit to 
64-bit at the source-code level is nontrivial for large projects with 
lots of developers. And you only have less information at assembly level.

-- 
Joshua Cranmer
Thunderbird and DXR developer
Source code archæologist




More information about the llvm-dev mailing list