[PATCH] D26232: [XRay][compiler-rt] XRay Buffer Queue

Wed Nov 16 17:32:59 PST 2016

dberris added inline comments.

================
Comment at: lib/xray/xray_buffer_queue.cc:32
+std::error_code BufferQueue::getBuffer(Buffer &Buf) {
+  if (Finalizing.load(std::memory_order_acquire))
+    return std::make_error_code(std::errc::state_not_recoverable);
----------------
rSerge wrote:
> dberris wrote:
> > rSerge wrote:
> > > Shouldn't this be checked inside the mutex lock?
> > Nope, `Finalizing` is atomic, and is already synchronised -- so we avoid locking the mutex when the BufferQueue is already finalizing.
> What if `finalize()` is called between `Finalizing.load(std::memory_order_acquire)` and `std::lock_guard<std::mutex> Guard(Mutex);` here? The description for this data structure claims that `getBuffer` requests must be denied after `finalize()` is called, but this doesn't happen in this scenario. To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).
> What if finalize() is called between Finalizing.load(std::memory_order_acquire) and std::lock_guard<std::mutex> Guard(Mutex); here? The description for this data structure claims that getBuffer requests must be denied after finalize() is called, but this doesn't happen in this scenario.

But it does, right?

If one thread called `getBuffer(...)` just before another thread called `finalize()` then the ongoing `getBuffer(...)` should continue because `finalize()` had not technically been called yet when it started. We don't intend to make `finalize()` block on outstanding/ongoing `getBuffer(...)` calls (or the other way around).

How do I make the documentation on the function clear about this?

> To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).

On some platforms, operations on `std::mutex` may be implemented much more heavily compared to an atomic bool (hence the avoidance of locking the mutex in the first place). I'm actually thinking about whether a relaxed load is actually sufficient (at least in x86 it makes it faster) for the check, while using a release store on the `finalize()` side.

At this point I'm trying to avoid having to manually implement a wait-free or lock-free circular buffer, but maybe that's what this needs to end up becoming. ;)

Let me think about it a little more.

https://reviews.llvm.org/D26232