[PATCH] D26413: [XRay] Support AArch64 in compiler-rt

Mon Nov 14 16:37:18 PST 2016

dberris added a comment.

In https://reviews.llvm.org/D26413#594413, @rengolin wrote:

> Right, so I think the current scenario is:
>
> 1. Disabled traces won't get there at all, as they'll branch +32 directly.
> 2. Enabled traces will make use of the push/pop sequence, so no slow down.
> 3. In rare cases, the handler will be removed in between an enabled call, which adds sync problems.

For 3, there's a few things working together:

- When tracing is enabled, the sleds have been overwritten to jump/call into the trampolines. Code here defines the trampolines. Note that the sleds can be made to jump/call into the trampolines even if the global log handler is `nullptr`, because that's already defined to be an `std::atomic<function_ptr>`.
- The global `XRayPatchedFunction` is the installed "log handler". This may be `nullptr` and the trampolines should be loading this in an atomic manner (potentially synchronised on some platforms) before calling it. In x86_64 we're almost assured that this is safe with a normal `mov` instruction. This is independent of whether XRay instrumentation is on/off (i.e. it's not required that the sleds are patched or unpatched for this pointer to be defined).
- The trampolines (defined in assembly for complete control) must check whether the function pointer in `XRayPatchedFunction` is valid (i.e. not nullptr) before invoking it. The trampolines still have to abide by the calling conventions of the platform, and therefore must not assume that the pointer is not nullptr (and must gracefully handle the case when it is nullptr).

> To fix 3, there are three ways:
> 
> 1. Make that an atomic pointer, complicating the thunk code making it bigger and slower.

We already do this for x86_64 which amounts to a normal `mov` instruction. Maybe something else needs to be done for ARM64?

> 2. Make sure any update to the function pointer is atomic with regards to it's use, which complicates the tracing logic.

This is already the case in C++ code (xray_interface.cc, see `__xray_set_handler(...)`). I'm not sure whether this should be different in ARM64 assembler to load the function pointer into a register in a "sequentially consistent" or even with acquire/release semantics.

> 3. Reduce the probability in the tracing code (some atomics/volatile) and "hope" that the distance between `CMP` and `BEQ` is small enough.
> 
>   We seem to be going for 3.

At least in x86_64, because we copy the pointer at one point into a register and compare the register, the original data may have already changed after the copy. Like Serge mentioned, we implicitly require that the handler function be able to handle spurious invocations.

> Given that this only affect XRay, I don't mind this implementation, as long as the users are aware of it. The XRay documentation [1] doesn't seem to cover that.
> 
> @dberris, it's up to you, really. :)

Thanks -- I'll fix the documentation to at least make it clear what the requirements/expectations on the log handler function ought to be.

It's the ARM64 assembly that I have no knowledge about as to whether there's a difference between a synchronised/atomic load of a pointer-sized value into a register and a normal load operation. If the way it's written now is already thread-safe (even if it's relaxed) then this is fine by me. :)

> cheers,
> --renato

Thanks Renato!

> [1] http://llvm.org/docs/XRay.html

https://reviews.llvm.org/D26413