[llvm-dev] Caller callee calling convention enforcement in C++ bin. code

Paul Muntean via llvm-dev llvm-dev at lists.llvm.org
Sat Jul 8 00:42:20 PDT 2017


>
>
> Hi Reid,
>>
>> please see underneath some clarification.
>> Thank you for your answer. It did provide a lot of helpful information!
>> I've included some follow up questions below and would really appreciate
>> your answers!
>> Further help/suggestion are highly welcome.
>>
>> The technique we use:
>> I infer the ranges of the callsites from the order in which my
>> maschinefunctionpass is invoked. As far as i can see, this order has to be
>> the same as the order in which the asmprinter is invoked and therefore the
>> order in which data is written to the ELF .text section. Since the code is
>> layed out in memory relative to the start of the section, this order is
>> well defined inside a single section.
>>
>> From there I'm currently writing code that emits EH label (with mark
>> machine basicblock edges). All I now need to do, is to store symbols to
>> these labels in .rodata (or a similar/custom ELF section). Then the loader
>> will relocate the address for us and we check the ranges by load
>> instructions on the read-only data.
>> Some advice on how to the add the relocations in a clean way would be
>> amazing :) But I can also figure this out myself I think.
>>
>> What do you mean by "the return address "VA" (I think, in ELF parlance)"?
>>
>> Here are our comments to your post.
>>
>> > Is it enough to compute the set of all possible return addresses, or do
>> you need to limit the set to only C++ method calls? If you just need the
>> full set of return addresses for a given DSO, I'd recommend disassembling
>> the object after linking, scraping the output for "callq" instructions, and
>> taking the address of the next instruction. This will give you the return
>> address "VA" (I think, in ELF parlance), which is the address of the
>> instruction assuming the ELF binary is loaded at the address listed in its
>> program headers. You can compute the possible return addresses at runtime
>> by adding the difference between the on-disk p_vaddr values and the actual
>> addresses that the loader used at runtime. You can probably discover the
>> load addresses with dl_iterate_phdr.
>>
>> We've made modifications to the llvm x86 backend that allow us to find
>> and filter the call instructions on the machineInstr level. i.e. the set of
>> calls we are interested in is known to us in the backend.
>> Right now I assume that the order in which functions are written to the
>> ELF file is only based on the order in which the X86AsmPrinter
>> MachineFunctionPass processes them.
>> Are we correct to assume this, and additionally that this order
>> consistent throughout all machineFunctionPasses added in the backend?
>> To get actual addresses relative to the image base of the ELF file, we
>> would probably have to parse (and maybe fully disassemble) the file.
>> Exactly as you said.
>>
>> > If you need only some specific annotated list of return addresses, you
>> will probably have to make complicated changes to LLVM that insert labels
>> after certain CALL instructions and emit some object file section with
>> relocations against those labels. This is doable but complicated. You can
>> follow the EH label machinery to see how to insert labels into the
>> instruction stream and create relocations against them from read-only data
>> sections.
>>
>> After looking at how EH labels are generated, I'd fully agree with you:
>> Combined with relocations this would be the cleaner, but also considerably
>> more complicated solution.
>> Do you think for this approach it would be better to patch an additional
>> read-only section using an external program, or to add the relocations to
>> the .rodata section emitted by LLVM?
>>
>>
>> On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <rnk at google.com> wrote:
>>
>>> Is it enough to compute the set of all possible return addresses, or do
>>> you need to limit the set to only C++ method calls? If you just need the
>>> full set of return addresses for a given DSO, I'd recommend disassembling
>>> the object after linking, scraping the output for "callq" instructions, and
>>> taking the address of the next instruction. This will give you the return
>>> address "VA" (I think, in ELF parlance), which is the address of the
>>> instruction assuming the ELF binary is loaded at the address listed in its
>>> program headers. You can compute the possible return addresses at runtime
>>> by adding the difference between the on-disk p_vaddr values and the actual
>>> addresses that the loader used at runtime. You can probably discover the
>>> load addresses with dl_iterate_phdr.
>>>
>>> If you need only some specific annotated list of return addresses, you
>>> will probably have to make complicated changes to LLVM that insert labels
>>> after certain CALL instructions and emit some object file section with
>>> relocations against those labels. This is doable but complicated. You can
>>> follow the EH label machinery to see how to insert labels into the
>>> instruction stream and create relocations against them from read-only data
>>> sections.
>>>
>>> On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> maybe you can help with an issue which I have.
>>>>
>>>> I want to recuperate for a C++ program compiled with Clang/LLVM on an
>>>> Ubuntu CPU x86_64 bit architecture all the addresses of the call
>>>> instructions (C++ object dispatches) or directly the return address
>>>> which are just the next address after a call instruction.
>>>>
>>>> I think that this information is not obtainable during link time since
>>>> we have at that moment only IR code. Please corect me if I am wrong.
>>>> So my assumption is that in the compiler back end after the IR code is
>>>> lowered to machine code and the addresses for the call instructions
>>>> and the addresses next to the call instructions are available.
>>>>
>>>> Has anybody a suggestion where are the possible places in the compiler
>>>> where I should look for?
>>>>
>>>> Since I am new to this topic suggestions or solutions are highly
>>>> welcome.
>>>>
>>>> -Paul
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170708/6c07bfeb/attachment.html>


More information about the llvm-dev mailing list