[llvm-dev] Writing a Pass in LLVM MC (Machine Code) level to Analyze Assembly Code

Lele Ma via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 27 17:50:02 PST 2019


Thank you so much! That is very helpful.

Best,
Lele

On Wed, Nov 27, 2019 at 2:00 AM Aaron Smith <aaron.lee.smith at gmail.com>
wrote:

> The MC layer doesn’t have passes. There is a method called
> emitIntruction() which is called one by one to create the MCInst.
>
> In the past I have accomplished what you’d like by overloading the methods
> in ObjectStreamer to buffer all the MCInst for a function. Then doing
> analysis on the buffered instructions.
>
> Here’s a link about how instructions are lowered which might shed some
> light on how all this works.
>
> https://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm
>
>
>
> On Nov 27, 2019, at 5:51 AM, Lele Ma <lelema.cn at gmail.com> wrote:
>
> 
> Hi All,
>
> A self-follow up and rephrase of my previous question with updated subject:
>
> What I want to do is to analyze hand-written assembly code with 'full
> details' where semantics of each instruction can be known in LLVM passes.
> Many of such instructions have no corresponding counterparts in IR/MIR
> forms, such as 'syscall' 'iret', etc. At MC level, such assembly code can
> be translated to MCInst easily since this level is closest to the assembly
> code. Therefore, I am thinking to write a pass at MC level instead of
> IR/MIR.
>
> However, when I am searching to learn the MC level passes, I cannot find
> any related classes in LLVM infrastructure (such as FunctionPass at IR
> level; MachineFunctionPass at MIR pass). Could anyone direct me where I
> should start to write a MC level pass?
>
> Best Regards,
> Lele
>
>
> On Mon, Nov 25, 2019 at 5:24 PM Lele Ma <lelema.cn at gmail.com> wrote:
>
>> Thank you for the instructions, Aaron and Nicolai!
>>
>> Raising a binary to LLVM IR, or raising to MIR is a reasonable solution
>> for me. However, given Nicolai's information that not all target-specific
>> instructions are representable in MIR, I got two questions that need your
>> help:
>>
>> 1. Why MIR does not necessarily represent all target specific
>> instructions for certain hardware? If someone added those support, will
>> this violate some design principles of MIR?
>>
>> 2. Instead of IR/MIR raising, I am wondering whether a third path is
>> possible to solve the problem of analyzing assembly code:
>> *    - write simple LLVM pass in the `MC` layer to process information
>> not available in MIR/IR and *
>> *    - passing analysis result from IR/MIR pass to the MC layer pass
>> where we can enhance the result with missing representations.*
>> So the second question is whether it is possible to write passes directly
>> in the MC layer? If so, is there any documentation or example for that?
>>
>>
>> Thank you in advance!
>>
>> Best Regards,
>> Lele
>>
>>
>> On Mon, Nov 25, 2019 at 9:15 AM Aaron Smith <aaron.lee.smith at gmail.com>
>> wrote:
>>
>>> Llvm-mctoll will raise a binary back to LLVM IR.
>>> Not exactly what you want but it might be something you can leverage.
>>>
>>> https://github.com/microsoft/llvm-mctoll
>>>
>>> On Mon, Nov 25, 2019 at 1:19 PM Nicolai Hähnle via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> On Thu, Nov 21, 2019 at 3:37 AM Lele Ma via llvm-dev
>>>> <llvm-dev at lists.llvm.org> wrote:
>>>> > My goal is to write LLVM Machine IR (MIR) passes to analyze the
>>>> assembly source code. But it seems I need to find a way to translate the
>>>> handwritten assembly code into MIR format first.
>>>> >
>>>> > Is there any materials, or any modules in LLVM source code, that can
>>>> help to translate assembly code into LLVM MIR for analysis?
>>>> >
>>>> > Or is there any easier ways to analyze assembly code in MIR passes
>>>> without translating it?
>>>>
>>>> MachineIR is designed for code generation, not for general assembly
>>>> representation. MIR is even not necessarily able to represent all
>>>> assembly instructions that a target's hardware supports. The
>>>> disassembler produces MCInsts, and if you wanted to go from there back
>>>> to MachineIR, you'd have to write your own target-specific code to do
>>>> so.
>>>>
>>>> Cheers,
>>>> Nicolai
>>>>
>>>>
>>>>
>>>> >
>>>> > Best Regards,
>>>> > Lele Ma
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > LLVM Developers mailing list
>>>> > llvm-dev at lists.llvm.org
>>>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>>
>>>> --
>>>> Lerne, wie die Welt wirklich ist,
>>>> aber vergiss niemals, wie sie sein sollte.
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191127/bc3101a0/attachment.html>


More information about the llvm-dev mailing list