[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Wed Aug 3 13:27:43 PDT 2016

Hi Dean,

I have a question about the following piece of code
in compiler-rt/trunk/lib/xray/xray_trampoline_x86.S :
  movq  _ZN6__xray19XRayPatchedFunctionE(%rip), %rax
  testq  %rax, %rax
  je  .Ltmp0

  // assume that %r10d has the function id.
  movl  %r10d, %edi
  xor  %esi,%esi
  callq  *%rax
What happens if someone unsets the handler function (i.e. calls
__xray_remove_handler() ) or changes the handler (i.e. calls
__xray_set_handler() with a different pointer to function) between "movq
 _ZN6__xray19XRayPatchedFunctionE(%rip), %rax" and "callq  *%rax" ? I
understood that the old handler will still be called, but isn't it a
problem? What if the code removing the handler starts destructing some
objects after that, so that a handler call would result in a crash or other
bad things?

The same concern for analogous code in __xray_FunctionExit .

Cheers,
Serge

On 30 July 2016 at 09:45, Dean Michael Berris <dean.berris at gmail.com> wrote:

>
> > On 30 Jul 2016, at 05:07, Serge Rogatch via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Thanks for pointing this out, Tim. Then maybe this approach is not the
> best choice for x86, though ideally measuring is needed, it is just that on
> ARM the current x86 approach is not applicable because ARM doesn't have a
> single return instruction (such as RETQ on x86_64), furthermore, the return
> instructions on ARM can be conditional.
>
> I think this is a flaw in the current implementation. I long suspected
> that other platforms didn't have a single return instruction like x86, and
> I think it's worth changing this.
>
> In an early version of the implementation that made it into LLVM, I made
> the determination whether an instruction was a return instruction, and make
> that determination per-platform.
>
> We should make that happen so we can easier support ARM.
>
> >
> > I have another question: what happens if the instrumented function (or
> its callees) throws an exception and doesn't catch? I understood that
> currently XRay will not report an exit from this function in such case
> because the function doesn't return with RETQ, but rather the stack
> unwinder jumps through functions calling the destructors of local variable
> objects.
> >
> > If so, why not to instrument the functions by placing a tracing object
> as the first local variable, with its constructor calling
> __xray_FunctionEntry and destructor calling __xray_FunctionExit ? Perhaps
> this approach requires changes in the front-end (C++ compiler, before
> emitting IR).
>
> That's right -- the approach I've been thinking about (but haven't gotten
> to implementing yet, because of other priorities) has been to have special
> instrumentation at the throw and catch points. Similarly understanding
> things like tail call exits and sibling call optimisations and
> instrumenting those exit points is another thing that is yet to be
> implemented.
>
> Unfortunately changing the front-end to add an RAII or guard object will
> cause a whole lot of problems too. For instance, optimisations on the IR
> can move instructions before/after the guard object initialisation and then
> we end up with less accurate instrumentation. This also introduces
> potentially unwanted semantics and changes the stack layout and affecting
> the performance of an application with XRay sleds but instrumentation not
> enabled.
>
>
>
> >
> > Cheers.
> >
> > On 29 July 2016 at 21:00, Tim Northover <t.p.northover at gmail.com> wrote:
> > On 28 July 2016 at 16:14, Serge Rogatch via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> > > Can I ask you why you chose to patch both function entrances and exits,
> > > rather than just patching the entrances and (in the patches) pushing
> on the
> > > stack the address of __xray_FunctionExit , so that the user function
> returns
> > > normally (with RETQ or POP RIP or whatever else instruction) rather
> than
> > > jumping into __xray_FunctionExit?
> >
> > > This approach should also be faster because smaller code better fits
> in CPU
> > > cache, and patching itself should run faster (because there is less
> code to
> > > modify).
> >
> > It may well be slower. Larger CPUs tend to track the call stack in
> > hardware and returning to an address pushed manually is an inevitable
> > branch mispredict in those cases.
> >
> > Cheers.
> >
> > Tim.
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160803/32a24ad4/attachment.html>