[PATCH] D26232: [XRay][compiler-rt] XRay Buffer Queue

Thu Nov 17 10:47:34 PST 2016

rSerge added inline comments.

================
Comment at: lib/xray/xray_buffer_queue.cc:32
+std::error_code BufferQueue::getBuffer(Buffer &Buf) {
+  if (Finalizing.load(std::memory_order_acquire))
+    return std::make_error_code(std::errc::state_not_recoverable);
----------------
dberris wrote:
> rSerge wrote:
> > dberris wrote:
> > > rSerge wrote:
> > > > Shouldn't this be checked inside the mutex lock?
> > > Nope, `Finalizing` is atomic, and is already synchronised -- so we avoid locking the mutex when the BufferQueue is already finalizing.
> > What if `finalize()` is called between `Finalizing.load(std::memory_order_acquire)` and `std::lock_guard<std::mutex> Guard(Mutex);` here? The description for this data structure claims that `getBuffer` requests must be denied after `finalize()` is called, but this doesn't happen in this scenario. To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).
> > What if finalize() is called between Finalizing.load(std::memory_order_acquire) and std::lock_guard<std::mutex> Guard(Mutex); here? The description for this data structure claims that getBuffer requests must be denied after finalize() is called, but this doesn't happen in this scenario.
> 
> But it does, right?
> 
> If one thread called `getBuffer(...)` just before another thread called `finalize()` then the ongoing `getBuffer(...)` should continue because `finalize()` had not technically been called yet when it started. We don't intend to make `finalize()` block on outstanding/ongoing `getBuffer(...)` calls (or the other way around).
> 
> How do I make the documentation on the function clear about this?
> 
> > To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).
> 
> On some platforms, operations on `std::mutex` may be implemented much more heavily compared to an atomic bool (hence the avoidance of locking the mutex in the first place). I'm actually thinking about whether a relaxed load is actually sufficient (at least in x86 it makes it faster) for the check, while using a release store on the `finalize()` side.
> 
> At this point I'm trying to avoid having to manually implement a wait-free or lock-free circular buffer, but maybe that's what this needs to end up becoming. ;)
> 
> Let me think about it a little more.
Have you considered http://en.cppreference.com/w/cpp/atomic/atomic_flag ? There is an example on how to acquire and release it, so to get a "spin lock" you need just to spin more.
I am not sure if "relaxed" will work, but you can avoid 2 acquire operations in a row (1 for `Finalizing`, 1 for the mutex) and instead just lock on the atomic flag, then check whether we are finalizing from a normal variable, then do the work and release the spin lock. It should not be spinning a lot because (if I understood correctly) `finalize` operation is rare and `getBuffer` is not called from multiple threads concurrently.

https://reviews.llvm.org/D26232