[PATCH] D26413: [XRay] Support AArch64 in compiler-rt

Dean Michael Berris via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 14 04:42:07 PST 2016


dberris added inline comments.


================
Comment at: lib/xray/xray_trampoline_AArch64.S:32
+    /* Handler address is nullptr if handler is not set */
+    CMP X2, #0
+    BEQ FunctionEntry_restore
----------------
rengolin wrote:
> dberris wrote:
> > rengolin wrote:
> > > rSerge wrote:
> > > > rengolin wrote:
> > > > > can't you compare the address before saving and restoring all those registers?
> > > > This situation is rare. Usually if the handler is set to `nullptr`, then the user code is unpatched, so the trampolines are not called. So there is no performance issue here.
> > > > However, there is a race condition issue. It is important to check for `nullptr` handler as late as possible before its call. If we push the registers after checking for `nullptr` handler, we increase the chances that handler is not `nullptr` when it's checked, but `nullptr` when it's called.
> > > > Though anyway we currently require that the handler function works correctly even if called after its removal via XRay API.
> > > > This situation is rare. Usually if the handler is set to nullptr, then the user code is unpatched, so the trampolines are not called. So there is no performance issue here.
> > > 
> > > Right, makes sense.
> > > 
> > > > However, there is a race condition issue. It is important to check for nullptr handler as late as possible before its call. If we push the registers after checking for nullptr handler, we increase the chances that handler is not nullptr when it's checked, but nullptr when it's called.
> > > 
> > > I'm not sure I follow.
> > > 
> > > You have added the last update to be the first branch atomically, to make sure there are no race conditions. 
> > > 
> > > Is this about the users calling the thunk while the update is ongoing, which can see the code but still don't have a function pointer?
> > > 
> > > In any case, if there is a race condition, it needs to be fixed, not work by chance.
> > > I'm not sure I follow.
> > > 
> > > You have added the last update to be the first branch atomically, to make sure there are no race conditions.
> > >
> > > Is this about the users calling the thunk while the update is ongoing, which can see the code but still don't have a function pointer?
> > >
> > > In any case, if there is a race condition, it needs to be fixed, not work by chance.
> > 
> > In x86_64 at least, there's no real "race" here because the pointer is implemented as a `std::atomic<function_ptr>`. The sequence here though at a high level should be:
> > 
> > 1. The code executing somehow knew to get to this trampoline.
> > 1. While some threads are running through this trampoline, another updates the global atomic pointer to `nullptr`.
> > 1. All threads that encounter the load for the function pointer should see the updated value of the pointer (since it "happens before" the load of the pointer).
> > 
> > This is at least the reason why the load of the pointer happens after the registers have been stashed onto the stack.
> > 
> > Does that make sense?
> > 2. While some threads are running through this trampoline, another updates the global atomic pointer to nullptr.
> 
> Hum, I wasn't expecting the pointer to move back to nullptr at any time.
> 
> In this case, there's still the race condition that the value changes in between the comparison (`CMP X2, #0`) and the jump (`BEQ`) which is, admittedly, very unlikely, but non-zero.
> Hum, I wasn't expecting the pointer to move back to nullptr at any time.

Yeah, that's a feature (see `__xray_remove_handler()`) to allow for turning off XRay logging at runtime in a sequence that guarantees things will continue to work. To disable XRay at runtime, users may:

1. Call `__xray_remove_handler()` to ensure that the current installed logging handler is removed atomically (in a cross-platform manner).
1. Call `__xray_unpatch()` to return the state of the sleds to "neutral".

Both of these are optional, could be performed in any order (thus operations need to be ensured thread-safe).

> In this case, there's still the race condition that the value changes in between the comparison (CMP X2, #0) and the jump (BEQ) which is, admittedly, very unlikely, but non-zero.

The way we handle this in x86_64 at least is to load the value from the global to a register, then perform the comparison with an immediate value (0). It looks like this code is already doing almost exactly the same thing -- maybe we need to be ensuring that the load is synchronised? My non-familiarity with ARM64 is showing here.


https://reviews.llvm.org/D26413





More information about the llvm-commits mailing list