[llvm-dev] Caller callee calling convention enforcement in C++ bin. code

Paul Muntean via llvm-dev llvm-dev at lists.llvm.org
Sat Jul 8 00:37:53 PDT 2017


On Sat, Jul 8, 2017 at 9:36 AM, Paul Muntean <paulmuntean at gmail.com> wrote:

> Hi Reid,
>
> please see underneath some clarification.
> Thank you for your answer. It did provide a lot of helpful information!
> I've included some follow up questions below and would really appreciate
> your answers!
> Further help/suggestion are highly welcome.
>
> The technique we use:
> I infer the ranges of the callsites from the order in which my
> maschinefunctionpass is invoked. As far as i can see, this order has to be
> the same as the order in which the asmprinter is invoked and therefore the
> order in which data is written to the ELF .text section. Since the code is
> layed out in memory relative to the start of the section, this order is
> well defined inside a single section.
>
> From there I'm currently writing code that emits EH label (with mark
> machine basicblock edges). All I now need to do, is to store symbols to
> these labels in .rodata (or a similar/custom ELF section). Then the loader
> will relocate the address for us and we check the ranges by load
> instructions on the read-only data.
> Some advice on how to the add the relocations in a clean way would be
> amazing :) But I can also figure this out myself I think.
>
> What do you mean by "the return address "VA" (I think, in ELF parlance)"?
>
> Here are our comments to your post.
>
> > Is it enough to compute the set of all possible return addresses, or do
> you need to limit the set to only C++ method calls? If you just need the
> full set of return addresses for a given DSO, I'd recommend disassembling
> the object after linking, scraping the output for "callq" instructions, and
> taking the address of the next instruction. This will give you the return
> address "VA" (I think, in ELF parlance), which is the address of the
> instruction assuming the ELF binary is loaded at the address listed in its
> program headers. You can compute the possible return addresses at runtime
> by adding the difference between the on-disk p_vaddr values and the actual
> addresses that the loader used at runtime. You can probably discover the
> load addresses with dl_iterate_phdr.
>
> We've made modifications to the llvm x86 backend that allow us to find and
> filter the call instructions on the machineInstr level. i.e. the set of
> calls we are interested in is known to us in the backend.
> Right now I assume that the order in which functions are written to the
> ELF file is only based on the order in which the X86AsmPrinter
> MachineFunctionPass processes them.
> Are we correct to assume this, and additionally that this order consistent
> throughout all machineFunctionPasses added in the backend?
> To get actual addresses relative to the image base of the ELF file, we
> would probably have to parse (and maybe fully disassemble) the file.
> Exactly as you said.
>
> > If you need only some specific annotated list of return addresses, you
> will probably have to make complicated changes to LLVM that insert labels
> after certain CALL instructions and emit some object file section with
> relocations against those labels. This is doable but complicated. You can
> follow the EH label machinery to see how to insert labels into the
> instruction stream and create relocations against them from read-only data
> sections.
>
> After looking at how EH labels are generated, I'd fully agree with you:
> Combined with relocations this would be the cleaner, but also considerably
> more complicated solution.
> Do you think for this approach it would be better to patch an additional
> read-only section using an external program, or to add the relocations to
> the .rodata section emitted by LLVM?
>
>
> On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com> wrote:
>
>> Is it enough to compute the set of all possible return addresses, or do
>> you need to limit the set to only C++ method calls? If you just need the
>> full set of return addresses for a given DSO, I'd recommend disassembling
>> the object after linking, scraping the output for "callq" instructions, and
>> taking the address of the next instruction. This will give you the return
>> address "VA" (I think, in ELF parlance), which is the address of the
>> instruction assuming the ELF binary is loaded at the address listed in its
>> program headers. You can compute the possible return addresses at runtime
>> by adding the difference between the on-disk p_vaddr values and the actual
>> addresses that the loader used at runtime. You can probably discover the
>> load addresses with dl_iterate_phdr.
>>
>> If you need only some specific annotated list of return addresses, you
>> will probably have to make complicated changes to LLVM that insert labels
>> after certain CALL instructions and emit some object file section with
>> relocations against those labels. This is doable but complicated. You can
>> follow the EH label machinery to see how to insert labels into the
>> instruction stream and create relocations against them from read-only data
>> sections.
>>
>> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi guys,
>>>
>>> maybe you can help with an issue which I have.
>>>
>>> I want to recuperate for a C++ program compiled with Clang/LLVM on an
>>> Ubuntu CPU x86_64 bit architecture all the addresses of the call
>>> instructions (C++ object dispatches) or directly the return address
>>> which are just the next address after a call instruction.
>>>
>>> I think that this information is not obtainable during link time since
>>> we have at that moment only IR code. Please corect me if I am wrong.
>>> So my assumption is that in the compiler back end after the IR code is
>>> lowered to machine code and the addresses for the call instructions
>>> and the addresses next to the call instructions are available.
>>>
>>> Has anybody a suggestion where are the possible places in the compiler
>>> where I should look for?
>>>
>>> Since I am new to this topic suggestions or solutions are highly welcome.
>>>
>>> -Paul
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>
>
> --
> Mit freundlichen Grüßen,
>
> Paul Muntean
>
>
>
>


-- 
Mit freundlichen Grüßen,

Paul Muntean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/3130081a/attachment.html>


More information about the llvm-dev mailing list