[llvm-dev] XRay: Demo on x86_64/Linux almost done; some questions.

Thu Jul 28 16:14:39 PDT 2016

Hello,

Can I ask you why you chose to patch both function entrances and exits,
rather than just patching the entrances and (in the patches) pushing on the
stack the address of __xray_FunctionExit , so that the user function
returns normally (with RETQ or POP RIP or whatever else instruction) rather
than jumping into __xray_FunctionExit?

By patching just the function entrances, you avoid duplication of the
function ID (which is currently taking space in the entrance and every
exit) and duplication of the rest of the exit patch for every of the
potentially many function exits.

This approach also avoids reporting exits for functions, for which
entrances have not been reported because the functions were already running
at the time patching happened.

This approach should also be faster because smaller code better fits in CPU
cache, and patching itself should run faster (because there is less code to
modify).

Or does this approach have some issues e.g. with exceptions, longjmp,
debugger, etc.?

Below is an example patch code for ARM (sorry, no resource to translate to
x86 myself). The compile-time stub ("sled") would contain a jump as the
first instruction, skipping 28 next bytes of NOOPs (on ARM each instruction
takes exactly 4 bytes, if not in Thumb etc. mode).

; Look at the disassembly to verify that the sled is inserted before the
;   instrumented function pushes caller's registers to the stack
;   (otherwise r4 may not get preserved)
PUSH {r4, lr}
ADR lr, #16 ; relative offset of after_entrance_traced
; r4 must be preserved by the instrumented function, so that
;   __xray_FunctionExit gets function ID in r4 too
LDR r4, [pc, #0] ; offset of function ID stored by the patching mechanism
; call __xray_FunctionEntry (returning to after_entrance_traced)
LDR pc, [pc, #0] ; use the address stored by the patching mechanism
.word <32-bit function ID>
.word <32-bit address of __xray_FunctionEntry>
.word <32-bit address of __xray_FunctionExit>
after_entrance_traced:
; Make the instrumented function think that it must return to
__xray_FunctionExit
LDR lr, [pc, #-12] ; offset of address of __xray_FunctionExit
; __xray_FunctionExit must "POP {r4, lr}" and in the end "BX lr"
; the body of the instrumented function follows

; Before patching (i.e. in sleds) the first instruction is a jump over the
;   whole stub to the first instruction in the body of the function. So lr
;   register stays original, thus no call to __xray_FunctionExit occurs at
the
;   the exit of the function, even if it is being patched concurrently.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/a6f48c57/attachment.html>