[PATCH] D26413: [XRay] Support AArch64 in compiler-rt

Serge Rogatch via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 14 07:52:04 PST 2016


rSerge marked an inline comment as done.
rSerge added inline comments.


================
Comment at: lib/xray/xray_trampoline_AArch64.S:32
+    /* Handler address is nullptr if handler is not set */
+    CMP X2, #0
+    BEQ FunctionEntry_restore
----------------
dberris wrote:
> rengolin wrote:
> > dberris wrote:
> > > rengolin wrote:
> > > > rSerge wrote:
> > > > > rengolin wrote:
> > > > > > can't you compare the address before saving and restoring all those registers?
> > > > > This situation is rare. Usually if the handler is set to `nullptr`, then the user code is unpatched, so the trampolines are not called. So there is no performance issue here.
> > > > > However, there is a race condition issue. It is important to check for `nullptr` handler as late as possible before its call. If we push the registers after checking for `nullptr` handler, we increase the chances that handler is not `nullptr` when it's checked, but `nullptr` when it's called.
> > > > > Though anyway we currently require that the handler function works correctly even if called after its removal via XRay API.
> > > > > This situation is rare. Usually if the handler is set to nullptr, then the user code is unpatched, so the trampolines are not called. So there is no performance issue here.
> > > > 
> > > > Right, makes sense.
> > > > 
> > > > > However, there is a race condition issue. It is important to check for nullptr handler as late as possible before its call. If we push the registers after checking for nullptr handler, we increase the chances that handler is not nullptr when it's checked, but nullptr when it's called.
> > > > 
> > > > I'm not sure I follow.
> > > > 
> > > > You have added the last update to be the first branch atomically, to make sure there are no race conditions. 
> > > > 
> > > > Is this about the users calling the thunk while the update is ongoing, which can see the code but still don't have a function pointer?
> > > > 
> > > > In any case, if there is a race condition, it needs to be fixed, not work by chance.
> > > > I'm not sure I follow.
> > > > 
> > > > You have added the last update to be the first branch atomically, to make sure there are no race conditions.
> > > >
> > > > Is this about the users calling the thunk while the update is ongoing, which can see the code but still don't have a function pointer?
> > > >
> > > > In any case, if there is a race condition, it needs to be fixed, not work by chance.
> > > 
> > > In x86_64 at least, there's no real "race" here because the pointer is implemented as a `std::atomic<function_ptr>`. The sequence here though at a high level should be:
> > > 
> > > 1. The code executing somehow knew to get to this trampoline.
> > > 1. While some threads are running through this trampoline, another updates the global atomic pointer to `nullptr`.
> > > 1. All threads that encounter the load for the function pointer should see the updated value of the pointer (since it "happens before" the load of the pointer).
> > > 
> > > This is at least the reason why the load of the pointer happens after the registers have been stashed onto the stack.
> > > 
> > > Does that make sense?
> > > 2. While some threads are running through this trampoline, another updates the global atomic pointer to nullptr.
> > 
> > Hum, I wasn't expecting the pointer to move back to nullptr at any time.
> > 
> > In this case, there's still the race condition that the value changes in between the comparison (`CMP X2, #0`) and the jump (`BEQ`) which is, admittedly, very unlikely, but non-zero.
> > Hum, I wasn't expecting the pointer to move back to nullptr at any time.
> 
> Yeah, that's a feature (see `__xray_remove_handler()`) to allow for turning off XRay logging at runtime in a sequence that guarantees things will continue to work. To disable XRay at runtime, users may:
> 
> 1. Call `__xray_remove_handler()` to ensure that the current installed logging handler is removed atomically (in a cross-platform manner).
> 1. Call `__xray_unpatch()` to return the state of the sleds to "neutral".
> 
> Both of these are optional, could be performed in any order (thus operations need to be ensured thread-safe).
> 
> > In this case, there's still the race condition that the value changes in between the comparison (CMP X2, #0) and the jump (BEQ) which is, admittedly, very unlikely, but non-zero.
> 
> The way we handle this in x86_64 at least is to load the value from the global to a register, then perform the comparison with an immediate value (0). It looks like this code is already doing almost exactly the same thing -- maybe we need to be ensuring that the load is synchronised? My non-familiarity with ARM64 is showing here.
There is a possibility that the handler is removed or changed (in another thread) after current thread's `BEQ`, but before the old handler is called, or even when the current thread is in the handler. I think we had this discussion for x86_64 XRay and decided that the code of the handler must be implemented in such a way that it handles decently the situation when the handler function is called after it is removed or changed via XRay API. Complete elimination of the possibility that the old handler is called would require heavy synchronization on the XRay side and will still not eliminate the possibility that the handler is executing in 1 thread and is being removed in another thread. This heavy synchronization is undesirable for the tracing component (XRay), so it seems better to impose some restrictions on the handler code to allow spurious calls (like condition variables allow spurious wakeups, because it's too costly to avoid them).
I just minimized the chances of spurious handler call by moving the handler check as close as possible to its call.


https://reviews.llvm.org/D26413





More information about the llvm-commits mailing list