<div dir="ltr"><div>Hello,</div><div><br></div>Can I ask you why you chose to patch both function entrances and exits, rather than just patching the entrances and (in the patches) pushing on the stack the address of __xray_FunctionExit , so that the user function returns normally (with RETQ or POP RIP or whatever else instruction) rather than jumping into __xray_FunctionExit?<br><br>By patching just the function entrances, you avoid duplication of the function ID (which is currently taking space in the entrance and every exit) and duplication of the rest of the exit patch for every of the potentially many function exits.<br><br>This approach also avoids reporting exits for functions, for which entrances have not been reported because the functions were already running at the time patching happened.<br><br>This approach should also be faster because smaller code better fits in CPU cache, and patching itself should run faster (because there is less code to modify).<br><br>Or does this approach have some issues e.g. with exceptions, longjmp, debugger, etc.?<br><br>Below is an example patch code for ARM (sorry, no resource to translate to x86 myself). The compile-time stub ("sled") would contain a jump as the first instruction, skipping 28 next bytes of NOOPs (on ARM each instruction takes exactly 4 bytes, if not in Thumb etc. mode).<div><font face="calibri, sans-serif"><span style="font-size:14.6667px;line-height:16.8667px"><br></span></font></div><div><font face="calibri, sans-serif"><span style="font-size:14.6667px;line-height:16.8667px"><div>; Look at the disassembly to verify that the sled is inserted before the</div><div>;   instrumented function pushes caller's registers to the stack</div><div>;   (otherwise r4 may not get preserved)</div><div>PUSH {r4, lr}</div><div>ADR lr, #16 ; relative offset of after_entrance_traced</div><div>; r4 must be preserved by the instrumented function, so that</div><div>;   __xray_FunctionExit gets function ID in r4 too</div><div>LDR r4, [pc, #0] ; offset of function ID stored by the patching mechanism</div><div>; call __xray_FunctionEntry (returning to after_entrance_traced)</div><div>LDR pc, [pc, #0] ; use the address stored by the patching mechanism</div><div>.word <32-bit function ID></div><div>.word <32-bit address of __xray_FunctionEntry></div><div>.word <32-bit address of __xray_FunctionExit></div><div>after_entrance_traced:</div><div>; Make the instrumented function think that it must return to __xray_FunctionExit</div><div>LDR lr, [pc, #-12] ; offset of address of __xray_FunctionExit</div><div>; __xray_FunctionExit must "POP {r4, lr}" and in the end "BX lr"</div><div>; the body of the instrumented function follows</div><div><br></div><div>; Before patching (i.e. in sleds) the first instruction is a jump over the</div><div>;   whole stub to the first instruction in the body of the function. So lr</div><div>;   register stays original, thus no call to __xray_FunctionExit occurs at the</div><div>;   the exit of the function, even if it is being patched concurrently.</div></span></font></div></div>