[llvm-dev] Custom Binary Format Challenges

Kenneth Adam Miller via llvm-dev llvm-dev at lists.llvm.org
Sun Apr 1 21:33:01 PDT 2018


Well, position independent code has to be woven into the final assembler,
and at least one technique uses call ret sequences to eject the instruction
pointer value. If that happens, then isn't it provided for somewhere in the
bitcode? I imagine so, but I don't know where to dig for it. Then again, it
may be something that is abstracted away from the bitcode, so that it's
woven in by some lower level pass that's right next to the assembler
selection.

Brenda, could you explain your challenges/objectives to me further?

On Sun, Apr 1, 2018 at 10:39 PM, Brenda So <sogun3 at gmail.com> wrote:

> The bitcode is only a representation of the IR, which is in SSA form. And
> SSA form assumes an infinite amount of registers, which is not offered by
> x86. When bitcode gets assembled/compiled to machine language, it breaks
> down the SSA form into non-SSA format. Personally I don't know how to use
> bitcode language to achieve what you want to do.
>
> The closest thing I can think of is the llvm-MC library, keystone and
> capstone project, :
>
> http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html
> http://www.keystone-engine.org/
> https://www.capstone-engine.org/
>
> In fact, I'm also looking for something similar -- to be able to specify
> the machine instructions base solely on the IR. If you found anything let
> me know!
>
> Brenda
>
> On Sun, Apr 1, 2018 at 5:39 PM, Jeremy Lakeman <Jeremy.Lakeman at gmail.com>
> wrote:
>
>> If you can write what you want to output in C with asm statements, clang
>> can show you what the IR should look like.
>>
>> On Mon, Apr 2, 2018 at 7:35 AM, Kenneth Adam Miller via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Program counter - EIP, RIP for x86/64. I need to obtain it and pass it
>>> as an argument to the function that calculates an ordinal from it.
>>>
>>> I think that there must be some way to use the bitcode language to place
>>> byte values at a designated offset. Or use the command line to specify the
>>> section and offset for the data.
>>>
>>> On Sun, Apr 1, 2018 at 6:00 PM, Brenda So <sogun3 at gmail.com> wrote:
>>>
>>>> Hi Kenneth,
>>>>
>>>> Can you elaborate what you mean by instruction pointer value? Like the
>>>> actual instruction with opcode and operands? With the sample code that I
>>>> showed you, the instrucrtion pointer in the innermost for loop will have
>>>> access to the following functions:
>>>>
>>>> http://llvm.org/doxygen/classllvm_1_1Instruction.html
>>>>
>>>> Alternatively, you can use the dump() operation to dump the
>>>> instructions out.
>>>>
>>>> Unfortunately I don't know how to address your second question. That's
>>>> stretching my knowledge in LLVM.
>>>>
>>>> Brenda
>>>>
>>>>
>>>> On Sun, Apr 1, 2018 at 11:32 AM, Kenneth Adam Miller <
>>>> kennethadammiller at gmail.com> wrote:
>>>>
>>>>> Thank you so much!
>>>>>
>>>>> What about discovering the instruction pointer value?
>>>>> Also, does anybody know how to embed an artifact as a resource in a
>>>>> binary? I'd like to have two text sections, and have one copied in from
>>>>> another binary.
>>>>>
>>>>> On Sun, Apr 1, 2018 at 2:15 PM, Brenda So <sogun3 at gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> You can write it as if you are writing an optimization pass:
>>>>>> http://llvm.org/docs/ProgrammersManual.html
>>>>>>
>>>>>> It sounds like your highest level is a module, hence you should write
>>>>>> a module pass. There is example code on LLVM Programmer's Manual on how to
>>>>>> do a function pass:
>>>>>>
>>>>>> Function* targetFunc = ...;
>>>>>> class OurFunctionPass : public FunctionPass {
>>>>>>   public:
>>>>>>     OurFunctionPass(): callCounter(0) { }
>>>>>>
>>>>>>     virtual runOnFunction(Function& F) {
>>>>>>       for (BasicBlock &B : F) {
>>>>>>         for (Instruction &I: B) {
>>>>>>           if (auto *CallInst = dyn_cast<CallInst>(&I)) {
>>>>>>             // We know we've encountered a call instruction, so we
>>>>>>             // need to determine if it's a call to the
>>>>>>             // function pointed to by m_func or not.
>>>>>>             if (CallInst->getCalledFunction() == targetFunc)
>>>>>>               ++callCounter;
>>>>>>           }
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>
>>>>>>   private:
>>>>>>     unsigned callCounter;};
>>>>>>
>>>>>> Making the FunctionPass a Module pass should be pretty easy with the
>>>>>> linked guide. (instead of inheriting from Function Pass you can inherit
>>>>>> frmo module pass) Afterwards, you can build your new pass against your LLVM
>>>>>> source code and run it using the opt functionality.
>>>>>>
>>>>>> Hope I didn't misunderstood your question -- if you have anymore let
>>>>>> me know!
>>>>>>
>>>>>> Brenda
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Apr 1, 2018 at 1:48 PM, Kenneth Adam Miller via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>>
>>>>>>> I hope you are all doing well and thanks in advance. I need to
>>>>>>> program a transformation of a set of llvm bitcode to have some various
>>>>>>> techniques woven in. In particular, I need to resolve a given computed
>>>>>>> target address to one of several in the same way that the function of a
>>>>>>> dynamic library is resolved, but I need this resolution to happen in the
>>>>>>> binary target of my choice where I tell it to. It's basically exactly the
>>>>>>> same facility as when you compile a group of files as a shared library
>>>>>>> target. The only difference is, I need this to happen under my control,
>>>>>>> according to function targets that I can choose and for an argument value
>>>>>>> that I can also choose as an ordinal to look them up.
>>>>>>>
>>>>>>> I think that I may need to write a compiler pass where this occurs
>>>>>>> but part of the problem is 1) I don't know how to make such a thing occur
>>>>>>> at the bitcode level, 2) and the oridinal is calculated from the
>>>>>>> instruction pointer.
>>>>>>>
>>>>>>> Can anybody help? Is there a library or function call for
>>>>>>> calculating lookup tables from a given set of targets given an ordinal? Is
>>>>>>> there a way to obtain the instruction pointer in llvm bitcode?
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180402/d2a4da20/attachment.html>


More information about the llvm-dev mailing list