[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Fri Jul 29 06:58:17 PDT 2016

On 29 July 2016 at 10:43, Dean Michael Berris <dean.berris at gmail.com> wrote:

>
> > On 29 Jul 2016, at 09:14, Serge Rogatch via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > Hello,
> >
> > Can I ask you why you chose to patch both function entrances and exits,
> rather than just patching the entrances and (in the patches) pushing on the
> stack the address of __xray_FunctionExit , so that the user function
> returns normally (with RETQ or POP RIP or whatever else instruction) rather
> than jumping into __xray_FunctionExit?
> >
> > By patching just the function entrances, you avoid duplication of the
> function ID (which is currently taking space in the entrance and every
> exit) and duplication of the rest of the exit patch for every of the
> potentially many function exits.
> >
> > This approach also avoids reporting exits for functions, for which
> entrances have not been reported because the functions were already running
> at the time patching happened.
> >
> > This approach should also be faster because smaller code better fits in
> CPU cache, and patching itself should run faster (because there is less
> code to modify).
> >
> > Or does this approach have some issues e.g. with exceptions, longjmp,
> debugger, etc.?
> >
>
> The only issues I can think of are those of potentially interfering with
> and invalidating the stack pointer at runtime. Because the patching and
> determination of what the function id's are happen at runtime and not
> statically, we can only provide the space for the function id. In x86_64
> this works out to only be just a few bytes. We also make sure XRay works
> even if frame pointers are omitted.
>
> Another issue is that of tail call and sibling call optimisations. Because
> exiting these functions actually turn out to be jumps, we cannot be sure
> that the jumped-to function will clean up the stack appropriately.
>
> As far as avoiding writing exit records without entry records, we deal
> with those externally (during analysis of the trace). It's important to
> know that when instrumentation is turned on (i.e. the log handler is not
> nullptr) that there was a function already running and that it exited at a
> given point in time. Especially when unwinding a deep function call stack,
> we can keep track of this as it's important information for analysis.
>
> Consider the following case:
>
> A() -> B() -> C() -> D() -> E()
>
> When instrumentation is enabled after E() has started, we can see records
> of the following kind:
>
> [timestamp, cpu] Exit E()
> [timestamp, cpu] Exit D()
> [timestamp, cpu] Exit B()
> [timestamp, cpu] Exit A()
>
> Note that the difference between "Exit E()" and "Exit D()" may not be 0 --
> and that there may have very well been work happening between the exit of
> E() and exit of D(), and similarly up the stack.
>
> Does this make sense?
>
Yes, this makes sense, thanks for the analysis. I'm going to investigate
later how to keep the stack consistent for unwinding (so to support C++
exceptions), e.g. by pretending that the __xray_FunctionExit call is the
destructor of the first object (local variable) on the stack.

>
> > Below is an example patch code for ARM (sorry, no resource to translate
> to x86 myself). The compile-time stub ("sled") would contain a jump as the
> first instruction, skipping 28 next bytes of NOOPs (on ARM each instruction
> takes exactly 4 bytes, if not in Thumb etc. mode).
> >
> > ; Look at the disassembly to verify that the sled is inserted before the
> > ;   instrumented function pushes caller's registers to the stack
> > ;   (otherwise r4 may not get preserved)
> > PUSH {r4, lr}
> > ADR lr, #16 ; relative offset of after_entrance_traced
> > ; r4 must be preserved by the instrumented function, so that
> > ;   __xray_FunctionExit gets function ID in r4 too
> > LDR r4, [pc, #0] ; offset of function ID stored by the patching mechanism
> > ; call __xray_FunctionEntry (returning to after_entrance_traced)
> > LDR pc, [pc, #0] ; use the address stored by the patching mechanism
> > .word <32-bit function ID>
> > .word <32-bit address of __xray_FunctionEntry>
> > .word <32-bit address of __xray_FunctionExit>
> > after_entrance_traced:
> > ; Make the instrumented function think that it must return to
> __xray_FunctionExit
> > LDR lr, [pc, #-12] ; offset of address of __xray_FunctionExit
> > ; __xray_FunctionExit must "POP {r4, lr}" and in the end "BX lr"
> > ; the body of the instrumented function follows
> >
> > ; Before patching (i.e. in sleds) the first instruction is a jump over
> the
> > ;   whole stub to the first instruction in the body of the function. So
> lr
> > ;   register stays original, thus no call to __xray_FunctionExit occurs
> at the
> > ;   the exit of the function, even if it is being patched concurrently.
>
> Cool, thanks -- we have an interim logging implementation for x86 which
> does the naïve logging to memory then flushes to disk regularly (I suspect
> you've already seen https://reviews.llvm.org/D21982).

No, I wasn't aware of that patch, thanks for pointing out!

> In that patch we have the very early beginnings of a test suite, so I
> think if you'd like to contribute the ARM implementation, that we can
> review that patch and land it to allow you to add tests and make sure that
> this also works on ARM.
>
> I have zero experience with actually doing anything with ARM assembly and
> I'd appreciate all the help I can get to make XRay work on ARM too.
>
Yes, I am trying to port XRay on LLVM to ARM, but I'm just starting with
LLVM.

>
> Cheers!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/8a562219/attachment.html>