<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="m_-4141157983555514280m_1929002210227028763gmail-moz-text-html" lang="x-western">
<div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">Hi Reid,</div><div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black"><br></div><div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">please see underneath some clarification. </div><div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black"><span style="font-size:14.6667px">Thank you for your answer. It did provide a lot of helpful information! I've included some follow up questions below and would really appreciate your answers! </span><br></div><div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">Further help/suggestion are highly welcome.</div><div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
<br>
</div><div style="margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">The technique we use:</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
I infer the ranges of the callsites from the order in which my
maschinefunctionpass is invoked. As far as i can see, this order has to
be the same as the order in which the asmprinter is invoked and
therefore the order in which data is written to the ELF .text
section. Since the code is layed out in memory relative to the start of
the section, this order is well defined inside a single section.
<br>
<br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
>From there I'm currently writing code that emits EH label (with mark
machine basicblock edges). All I now need to do, is to store symbols to
these labels in .rodata (or a similar/custom ELF section). Then the
loader will relocate the address for us and we check
the ranges by load instructions on the read-only data. <br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
Some advice on how to the add the relocations in a clean way would be amazing :) But I can also figure this out myself I think.<br>
</div><div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black"><br></div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
What do you mean by "the return address "VA" (I think, in ELF parlance)"?
<br><br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
Here are our comments to your post.<br></div><span><div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black"><br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
> Is it enough to compute the set of all possible return addresses,
or do you need to limit the set to only C++ method calls? If you just
need the full set of return addresses for a given DSO, I'd recommend
disassembling the object after linking, scraping the
output for "callq" instructions, and taking the address of the next
instruction. This will give you the return address "VA" (I think, in ELF
parlance), which is the address of the instruction assuming the ELF
binary is loaded at the address listed in its program
headers. You can compute the possible return addresses at runtime by
adding the difference between the on-disk p_vaddr values and the actual
addresses that the loader used at runtime. You can probably discover the
load addresses with dl_iterate_phdr.
<br><br>
</div>
</span><div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
We've made modifications to the llvm x86 backend that allow us to find
and filter the call instructions on the machineInstr level. i.e. the set
of calls we are interested in is known to us in the backend.
<br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
Right now I assume that the order in which functions are written to the
ELF file is only based on the order in which the X86AsmPrinter
MachineFunctionPass processes them.
<br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
Are we correct to assume this, and additionally that this order
consistent throughout all machineFunctionPasses added in the backend?
<br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
To get actual addresses relative to the image base of the ELF file, we
would probably have to parse (and maybe fully disassemble) the file.
Exactly as you said.
<br><br>
</div><span>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
> If you need only some specific annotated list of return addresses,
you will probably have to make complicated changes to LLVM that insert
labels after certain CALL instructions and emit some object file section
with relocations against those labels. This
is doable but complicated. You can follow the EH label machinery to see
how to insert labels into the instruction stream and create relocations
against them from read-only data sections.
<br>
<br>
</div>
</span><div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
After looking at how EH labels are generated, I'd fully agree with you:
Combined with relocations this would be the cleaner, but also
considerably more complicated solution.
<br>
</div>
<div dir="auto" style="direction:ltr;margin:0px;padding:0px;font-family:sans-serif;font-size:11pt;color:black">
Do you think for this approach it would be better to patch an additional
read-only section using an external program, or to add the relocations
to the .rodata section emitted by LLVM?
<br>
<br>
</div>
</div></div><div class="gmail_extra"><div><div class="m_-4141157983555514280h5"><br><div class="gmail_quote">On Thu, Jul 6, 2017 at 5:53 PM, Reid Kleckner <span dir="ltr"><<a href="mailto:rnk@google.com" target="_blank">rnk@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Is it enough to compute the set of all possible return addresses, or do you need to limit the set to only C++ method calls? If you just need the full set of return addresses for a given DSO, I'd recommend disassembling the object after linking, scraping the output for "callq" instructions, and taking the address of the next instruction. This will give you the return address "VA" (I think, in ELF parlance), which is the address of the instruction assuming the ELF binary is loaded at the address listed in its program headers. You can compute the possible return addresses at runtime by adding the difference between the on-disk p_vaddr values and the actual addresses that the loader used at runtime. You can probably discover the load addresses with dl_iterate_phdr.<div><br></div><div>If you need only some specific annotated list of return addresses, you will probably have to make complicated changes to LLVM that insert labels after certain CALL instructions and emit some object file section with relocations against those labels. This is doable but complicated. You can follow the EH label machinery to see how to insert labels into the instruction stream and create relocations against them from read-only data sections.</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="m_-4141157983555514280m_1929002210227028763h5">On Wed, Jul 5, 2017 at 9:22 AM, Paul Muntean via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_-4141157983555514280m_1929002210227028763h5"><div dir="ltr"><div class="m_-4141157983555514280m_1929002210227028763m_9151030959051609149m_-4400195901339444270gmail_signature"><div dir="ltr"><div><span style="font-family:Arial,Helvetica,sans-serif;font-size:13px">Hi guys,</span><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">maybe you can help with an issue which I have.</p><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">I want to recuperate for a C++ program compiled with Clang/LLVM on an<br>Ubuntu CPU x86_64 bit architecture all the addresses of the call<br>instructions (C++ object dispatches) or directly the return address<br>which are just the next address after a call instruction.</p><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">I think that this information is not obtainable during link time since<br>we have at that moment only IR code. Please corect me if I am wrong.<br>So my assumption is that in the compiler back end after the IR code is<br>lowered to machine code and the addresses for the call instructions<br>and the addresses next to the call instructions are available.</p><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">Has anybody a suggestion where are the possible places in the compiler<br>where I should look for?</p><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">Since I am new to this topic suggestions or solutions are highly welcome.</p><span class="m_-4141157983555514280m_1929002210227028763m_9151030959051609149HOEnZb"><font color="#888888"><p style="margin:1em 0px;padding:0px;border:0px;line-height:normal;font-family:Arial,Helvetica,sans-serif;font-size:13px">-Paul</p><div><div><div><div><div></div></div></div></div></div></font></span></div></div></div>
</div>
<br></div></div>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div>
</blockquote></div><br><div><br></div></div></div><span class="m_-4141157983555514280HOEnZb"><font color="#888888"><div class="m_-4141157983555514280m_1929002210227028763gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><br><div><div><div><div><div></div></div></div></div></div></div></div></div>
</font></span></div>
</blockquote></div><br><div><br></div></div></div></div></div></blockquote></div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><span style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="border-collapse:collapse;font-family:arial,sans-serif;font-size:13px"><br></span><br><div><div><div><div><div></div></div></div></div></div></div></div></div>
</div></div>