[LLVMdev] how to transform elf binary to llvm IR?

mats petersson mats at planetcatfish.com
Fri Jul 17 11:50:19 PDT 2015


Shuai: I think we are agreeing - I was just saying that it's very
difficult, but in a different way than how you were saying it. A large part
of the difficulty is that there is "more information" in the higher level
description of code, than there is in the lower level, and the
compiler/translator "removes" (some of) that information when compiling. A
very simple example is:

     struct ab
     {
        int a;
        float b;
    };

     struct ab a;

     foo(a);

will look (in most compilers) the same as

     int a;
     float b;

     foo(a, b);

Debug information and symbols can of course help here, but if the code
doesn't have that, then there isn't any way to tell `foo(int, float)` from
`foo(struct ab)` as a signature. So whilst it MAY be possible to recreate
the code at a higher level that is functionally equivalent, a lot of the
"helping" features in the high-level language will go missing because the
information was "removed" by the compiler.

--
Mats

On 17 July 2015 at 18:31, Shuai Wang <wangshuai901 at gmail.com> wrote:

> Hello Mats,
>
> I am sorry but I didn't fully get your point. Actually things have moving
> forward and recently research have (marginally) solved some obstacles
> proposed before.
>
> Actually I am working on related reverse engineering topics for a while
> and according to my review this is no open-source tool can fully solve this
> challenge,
> even for binaries generated from well-written C program by widely-used
> compiler (32-bit gcc, with no optimization). We can discuss more in the
> email if you would like to.
>
> You might want to check papers I listed in the previous email, which
> discussed several issues in translating binary into LLVM IR,
> and also some recent research paper on disassembling itself.
>
>
> Sincerely,
> Shuai
>
>
> On Fri, Jul 17, 2015 at 12:45 PM, mats petersson <mats at planetcatfish.com>
> wrote:
>
>> For every level of translation [in terms of "human readable -> machine
>> code translation", not someone translating a literary work from one
>> language to another - although often some subtle details are lost here
>> too], a little bit of the semantic meaning is lost. This means that you can
>> almost never completely reconstruct the code in original form from the
>> machine-code, or the C-code from the LLVM IR, or the C++ code from the
>> output of something like cfront (the original C++ -> C translator), or the
>> original Pascal code from a Pascal to C compiler, etc.
>>
>> It is, at least sometimes, possible to reconstruct something that can
>> then be "compiled" [in quotes as it's a loose term in this discussion]
>> again from the binary file, but it's often lacking some of the original
>> subtlety. And there are certainly cases where the original code is very
>> hard to derive from the machine-code. I played with a "symbolic
>> disassembler" many years back, and on "well-behaved code" it would
>> reconstruct assembly code that could be recompiled, but it struggled with
>> for example switch-statements that became a PC-relative jump-table, because
>> when you modify the code, it couldn't figure out what the jumps were - just
>> as one example.
>>
>>
>> I'm pretty sure it's possible to, at least as a human, write code that is
>> nearly impossible to translate back to a higher level language. And modern
>> compilers may not use the same types of obfuscation, but they will
>> certainly produce code that is complex, hard to follow and not using
>> obvious instructions for some particular purpose.
>>
>> --
>> Mats
>>
>> On 17 July 2015 at 17:11, Shuai Wang <wangshuai901 at gmail.com> wrote:
>>
>>> This is not a easy task. And I believe there is *NO* (open-source) tool
>>> can fully solve this problem (statically). Correct me if I was wrong.
>>>
>>> It would be more helpful if you can provide details about what you want
>>> to do, say, static or dynamic ? stripped binary or binary with symbolic
>>> information?
>>> What compiler do you work on?
>>>
>>> Check out  papers below if you are interested.
>>>
>>> http://dl.acm.org/citation.cfm?id=2465380
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation.cfm-3Fid-3D2465380&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=PMWV93YoHpzwPfOq-d9rjutlZ5ICwU8uIp3HLShT_D0&s=74RkRYSGnXHwJXd5DvxXdamQv0mj7_NjyBzbdCNRrYo&e=>
>>>
>>> http://dl.acm.org/citation.cfm?id=2462165
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__dl.acm.org_citation.cfm-3Fid-3D2462165&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=PMWV93YoHpzwPfOq-d9rjutlZ5ICwU8uIp3HLShT_D0&s=rpl0PCuoy_iecIKs3lz3F0nGYQYw1J1cqTapvfLsceo&e=>
>>>
>>>
>>>
>>> Shuai
>>>
>>>
>>>
>>> On Fri, Jul 17, 2015 at 3:09 AM, 慕冬亮 <mudongliangabcd at gmail.com> wrote:
>>>
>>>> I want to transform elf binary to llvm IR, and do some instrumentation
>>>> based on llvm.
>>>> Is there any tool which can do the transformation?
>>>> Thanks in advance.
>>>>
>>>>     - mudongliang
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150717/0eb2c8fe/attachment.html>


More information about the llvm-dev mailing list