[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Fri Jul 29 00:43:53 PDT 2016

> On 29 Jul 2016, at 09:14, Serge Rogatch via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hello,
> 
> Can I ask you why you chose to patch both function entrances and exits, rather than just patching the entrances and (in the patches) pushing on the stack the address of __xray_FunctionExit , so that the user function returns normally (with RETQ or POP RIP or whatever else instruction) rather than jumping into __xray_FunctionExit?
> 
> By patching just the function entrances, you avoid duplication of the function ID (which is currently taking space in the entrance and every exit) and duplication of the rest of the exit patch for every of the potentially many function exits.
> 
> This approach also avoids reporting exits for functions, for which entrances have not been reported because the functions were already running at the time patching happened.
> 
> This approach should also be faster because smaller code better fits in CPU cache, and patching itself should run faster (because there is less code to modify).
> 
> Or does this approach have some issues e.g. with exceptions, longjmp, debugger, etc.?
> 

The only issues I can think of are those of potentially interfering with and invalidating the stack pointer at runtime. Because the patching and determination of what the function id's are happen at runtime and not statically, we can only provide the space for the function id. In x86_64 this works out to only be just a few bytes. We also make sure XRay works even if frame pointers are omitted.

Another issue is that of tail call and sibling call optimisations. Because exiting these functions actually turn out to be jumps, we cannot be sure that the jumped-to function will clean up the stack appropriately.

As far as avoiding writing exit records without entry records, we deal with those externally (during analysis of the trace). It's important to know that when instrumentation is turned on (i.e. the log handler is not nullptr) that there was a function already running and that it exited at a given point in time. Especially when unwinding a deep function call stack, we can keep track of this as it's important information for analysis.

Consider the following case:

A() -> B() -> C() -> D() -> E()

When instrumentation is enabled after E() has started, we can see records of the following kind:

[timestamp, cpu] Exit E()
[timestamp, cpu] Exit D()
[timestamp, cpu] Exit B()
[timestamp, cpu] Exit A()

Note that the difference between "Exit E()" and "Exit D()" may not be 0 -- and that there may have very well been work happening between the exit of E() and exit of D(), and similarly up the stack.

Does this make sense?

> Below is an example patch code for ARM (sorry, no resource to translate to x86 myself). The compile-time stub ("sled") would contain a jump as the first instruction, skipping 28 next bytes of NOOPs (on ARM each instruction takes exactly 4 bytes, if not in Thumb etc. mode).
> 
> ; Look at the disassembly to verify that the sled is inserted before the
> ;   instrumented function pushes caller's registers to the stack
> ;   (otherwise r4 may not get preserved)
> PUSH {r4, lr}
> ADR lr, #16 ; relative offset of after_entrance_traced
> ; r4 must be preserved by the instrumented function, so that
> ;   __xray_FunctionExit gets function ID in r4 too
> LDR r4, [pc, #0] ; offset of function ID stored by the patching mechanism
> ; call __xray_FunctionEntry (returning to after_entrance_traced)
> LDR pc, [pc, #0] ; use the address stored by the patching mechanism
> .word <32-bit function ID>
> .word <32-bit address of __xray_FunctionEntry>
> .word <32-bit address of __xray_FunctionExit>
> after_entrance_traced:
> ; Make the instrumented function think that it must return to __xray_FunctionExit
> LDR lr, [pc, #-12] ; offset of address of __xray_FunctionExit
> ; __xray_FunctionExit must "POP {r4, lr}" and in the end "BX lr"
> ; the body of the instrumented function follows
> 
> ; Before patching (i.e. in sleds) the first instruction is a jump over the
> ;   whole stub to the first instruction in the body of the function. So lr
> ;   register stays original, thus no call to __xray_FunctionExit occurs at the
> ;   the exit of the function, even if it is being patched concurrently.

Cool, thanks -- we have an interim logging implementation for x86 which does the naïve logging to memory then flushes to disk regularly (I suspect you've already seen https://reviews.llvm.org/D21982). In that patch we have the very early beginnings of a test suite, so I think if you'd like to contribute the ARM implementation, that we can review that patch and land it to allow you to add tests and make sure that this also works on ARM.

I have zero experience with actually doing anything with ARM assembly and I'd appreciate all the help I can get to make XRay work on ARM too.

Cheers!