[LLVMdev] Lifting ASM to IR

Fri Mar 13 09:15:57 PDT 2015

On Fri, Mar 13, 2015 at 7:47 AM, Jonathan Roelofs <jonathan at codesourcery.com
> wrote:

>
>
> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>
>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at gmail.com>> wrote:
>>
>>     > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>>     >> Does there exist a tool that could lift a binary (assembly for some
>>     >> supported target) to LLVM IR?  If there isn't, does this seem like
>>     >> something that would be feasible?
>>
>>     There's plenty of variations on the idea: Revgen/S2E, Fracture, Dagger
>>     (my own), libcpu, several closed-source ones used by pentest shops,
>>     some that use another representation before going to IR (say
>>     llvm-qemu),  and probably others still I forgot about.
>>
>>     Are you interested in a specific target / use case?
>>
>>
>> I was thinking something along the lines of lifting a binary into IR and
>> spitting it out for a different processor.
>>
>
> This is going to be extremely difficult. Imagine for example how this
> function would be compiled:
>
>   struct Foo {
>     void *v;
>     int i;
>     long l;
>   };
>
>   long bar(Foo *f) {
>     return f->l;
>   }
>
> If we pick a particular target, and compile this function for that, then
> 'foo' will have some offset into the struct from which it loads 'l'. This
> is easy because we know the sizes of the struct's members, and the layout
> rules for structs on the target.
>
> Now turn that around: given an offset into a struct for one target, what's
> the offset into the same struct on another target? We're stuck because we
> can't reconstruct from this offset what the sizes of v and i are
> individually; all we have is their sum (and that doesn't even take
> alignment issues into account).  This is because when we're looking at the
> binary we don't know, given that offset, that the two elements in front of
> it are a void* and an int.
>
> Now, you might think that: "well, okay we'll just use the offsets from one
> target in the other target's binaries". But that isn't going to work
> either: what if void* isn't the same size between the two targets? And
> that's just the tip of the iceberg.
>
> TL;DR: binary translation is a very difficult problem.


I was afraid of something like that.  I was thinking that translating
function calls would be an issue; I didn't think about data layout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/236cfbb5/attachment.html>