[LLVMdev] Lifting ASM to IR

mats petersson mats at planetcatfish.com
Fri Mar 13 09:58:34 PDT 2015


Having spent my formative years (or whatever it's called) many years
back writing a disassembler that produced "re-assemblable" code (that
worked most of the time), I'm quite familiar with the fundamentals of
simply knowing "what is read-only data, what is instructions". I had
to invent several heuristics to solve for example jump-tables [switch
statement] that are PC-relative or text-strings embedded in the "code"
section  - both typically "look kind of like instructions, but aren't
really" - strings are a little less difficult than jump-tables, but
both can be very tricky - especially if they are relatively short
sequences (is "F\n" an instruction or string?)

If you then want to abstract it further, to a higher level language
(e.g LLVM IR), you have to actually undersstand the semantics of each
instruction.

Calls are typically not so bad - you can figure out from the system
calling convention what is part of the call and what isn't (assuming
the calling convention of the compiler is documented, of course)

--
Mats

On 13 March 2015 at 16:26, yekkas <yekkas at gmail.com> wrote:
> On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:
>> what if void* isn't the same size between the two targets?
>
> I believe that the translated program has no other choice but to use the
> same
> pointer size and data layout as the original one. Though, memory operations
> have
> to be modified to map one memory space into another properly.
>
>
>
> On 03/13/2015 05:47 PM, Jonathan Roelofs wrote:
>>
>>
>>
>> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>>>
>>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at gmail.com>> wrote:
>>>
>>>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>>>     >> Does there exist a tool that could lift a binary (assembly for
>>> some
>>>     >> supported target) to LLVM IR?  If there isn't, does this seem like
>>>     >> something that would be feasible?
>>>
>>>     There's plenty of variations on the idea: Revgen/S2E, Fracture,
>>> Dagger
>>>     (my own), libcpu, several closed-source ones used by pentest shops,
>>>     some that use another representation before going to IR (say
>>>     llvm-qemu),  and probably others still I forgot about.
>>>
>>>     Are you interested in a specific target / use case?
>>>
>>>
>>> I was thinking something along the lines of lifting a binary into IR and
>>> spitting it out for a different processor.
>>
>>
>> This is going to be extremely difficult. Imagine for example how this
>> function would be compiled:
>>
>>   struct Foo {
>>     void *v;
>>     int i;
>>     long l;
>>   };
>>
>>   long bar(Foo *f) {
>>     return f->l;
>>   }
>>
>> If we pick a particular target, and compile this function for that, then
>> 'foo' will have some offset into the struct from which it loads 'l'. This is
>> easy because we know the sizes of the struct's members, and the layout rules
>> for structs on the target.
>>
>> Now turn that around: given an offset into a struct for one target, what's
>> the offset into the same struct on another target? We're stuck because we
>> can't reconstruct from this offset what the sizes of v and i are
>> individually; all we have is their sum (and that doesn't even take alignment
>> issues into account).  This is because when we're looking at the binary we
>> don't know, given that offset, that the two elements in front of it are a
>> void* and an int.
>>
>> Now, you might think that: "well, okay we'll just use the offsets from one
>> target in the other target's binaries". But that isn't going to work either:
>> what if void* isn't the same size between the two targets? And that's just
>> the tip of the iceberg.
>>
>> TL;DR: binary translation is a very difficult problem.
>>
>>
>> Jon
>>
>>>
>>> Or maybe a decompiling tool of some kind.
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev



More information about the llvm-dev mailing list