[LLVMdev] Lifting ASM to IR
Daniel Dilts
diltsman at gmail.com
Fri Mar 13 09:15:57 PDT 2015
On Fri, Mar 13, 2015 at 7:47 AM, Jonathan Roelofs <jonathan at codesourcery.com
> wrote:
>
>
> On 3/12/15 8:14 PM, Daniel Dilts wrote:
>
>> On Thu, Mar 12, 2015 at 6:33 PM, Ahmed Bougacha
>> <ahmed.bougacha at gmail.com <mailto:ahmed.bougacha at gmail.com>> wrote:
>>
>> > On Thu, Mar 12, 2015 at 05:44:02PM -0700, Daniel Dilts wrote:
>> >> Does there exist a tool that could lift a binary (assembly for some
>> >> supported target) to LLVM IR? If there isn't, does this seem like
>> >> something that would be feasible?
>>
>> There's plenty of variations on the idea: Revgen/S2E, Fracture, Dagger
>> (my own), libcpu, several closed-source ones used by pentest shops,
>> some that use another representation before going to IR (say
>> llvm-qemu), and probably others still I forgot about.
>>
>> Are you interested in a specific target / use case?
>>
>>
>> I was thinking something along the lines of lifting a binary into IR and
>> spitting it out for a different processor.
>>
>
> This is going to be extremely difficult. Imagine for example how this
> function would be compiled:
>
> struct Foo {
> void *v;
> int i;
> long l;
> };
>
> long bar(Foo *f) {
> return f->l;
> }
>
> If we pick a particular target, and compile this function for that, then
> 'foo' will have some offset into the struct from which it loads 'l'. This
> is easy because we know the sizes of the struct's members, and the layout
> rules for structs on the target.
>
> Now turn that around: given an offset into a struct for one target, what's
> the offset into the same struct on another target? We're stuck because we
> can't reconstruct from this offset what the sizes of v and i are
> individually; all we have is their sum (and that doesn't even take
> alignment issues into account). This is because when we're looking at the
> binary we don't know, given that offset, that the two elements in front of
> it are a void* and an int.
>
> Now, you might think that: "well, okay we'll just use the offsets from one
> target in the other target's binaries". But that isn't going to work
> either: what if void* isn't the same size between the two targets? And
> that's just the tip of the iceberg.
>
> TL;DR: binary translation is a very difficult problem.
I was afraid of something like that. I was thinking that translating
function calls would be an issue; I didn't think about data layout.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150313/236cfbb5/attachment.html>
More information about the llvm-dev
mailing list