<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Mar 13, 2015 at 7:47 AM, Jonathan Roelofs <span dir="ltr"><<a href="mailto:jonathan@codesourcery.com" target="_blank">jonathan@codesourcery.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span><br>

<br>

On 3/12/15 8:14 PM, Daniel Dilts wrote:<br>

</span><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;padding-left:1ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid"><span>

On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha<br></span><span>

<<a href="mailto:ahmed.bougacha@gmail.com" target="_blank">ahmed.bougacha@gmail.com</a> <mailto:<a href="mailto:ahmed.bougacha@gmail.com" target="_blank">ahmed.bougacha@gmail.<u></u>com</a>>> wrote:<br>

<br>

    > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:<br>

    >> Does there exist a tool that could lift a binary (assembly for some<br>

    >> supported target) to LLVM IR?  If there isn't, does this seem like<br>

    >> something that would be feasible?<br>

<br>

    There's plenty of variations on the idea: Revgen/S2E, Fracture, Dagger<br>

    (my own), libcpu, several closed-source ones used by pentest shops,<br>

    some that use another representation before going to IR (say<br>

    llvm-qemu),  and probably others still I forgot about.<br>

<br>

    Are you interested in a specific target / use case?<br>

<br>

<br>

I was thinking something along the lines of lifting a binary into IR and<br>

spitting it out for a different processor.<br>

</span></blockquote>

<br>

This is going to be extremely difficult. Imagine for example how this function would be compiled:<br>

<br>

  struct Foo {<br>

    void *v;<br>

    int i;<br>

    long l;<br>

  };<br>

<br>

  long bar(Foo *f) {<br>

    return f->l;<br>

  }<br>

<br>

If we pick a particular target, and compile this function for that, then 'foo' will have some offset into the struct from which it loads 'l'. This is easy because we know the sizes of the struct's members, and the layout rules for structs on the target.<br>

<br>

Now turn that around: given an offset into a struct for one target, what's the offset into the same struct on another target? We're stuck because we can't reconstruct from this offset what the sizes of v and i are individually; all we have is their sum (and that doesn't even take alignment issues into account).  This is because when we're looking at the binary we don't know, given that offset, that the two elements in front of it are a void* and an int.<br>

<br>

Now, you might think that: "well, okay we'll just use the offsets from one target in the other target's binaries". But that isn't going to work either: what if void* isn't the same size between the two targets? And that's just the tip of the iceberg.<br>

<br>

TL;DR: binary translation is a very difficult problem.</blockquote><div><br></div><div>I was afraid of something like that.  I was thinking that translating function calls would be an issue; I didn't think about data layout. </div></div><br></div></div>