[PATCH] D26232: [XRay][compiler-rt] XRay Buffer Queue

Thu Nov 17 16:11:54 PST 2016

dberris planned changes to this revision.
dberris added a comment.

I'll be adding more representative use-case tests (making sure we're doing the right thing when functions are called in multiple threads).

================
Comment at: lib/xray/xray_buffer_queue.cc:32
+std::error_code BufferQueue::getBuffer(Buffer &Buf) {
+  if (Finalizing.load(std::memory_order_acquire))
+    return std::make_error_code(std::errc::state_not_recoverable);
----------------
rSerge wrote:
> dberris wrote:
> > rSerge wrote:
> > > dberris wrote:
> > > > rSerge wrote:
> > > > > Shouldn't this be checked inside the mutex lock?
> > > > Nope, `Finalizing` is atomic, and is already synchronised -- so we avoid locking the mutex when the BufferQueue is already finalizing.
> > > What if `finalize()` is called between `Finalizing.load(std::memory_order_acquire)` and `std::lock_guard<std::mutex> Guard(Mutex);` here? The description for this data structure claims that `getBuffer` requests must be denied after `finalize()` is called, but this doesn't happen in this scenario. To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).
> > > What if finalize() is called between Finalizing.load(std::memory_order_acquire) and std::lock_guard<std::mutex> Guard(Mutex); here? The description for this data structure claims that getBuffer requests must be denied after finalize() is called, but this doesn't happen in this scenario.
> > 
> > But it does, right?
> > 
> > If one thread called `getBuffer(...)` just before another thread called `finalize()` then the ongoing `getBuffer(...)` should continue because `finalize()` had not technically been called yet when it started. We don't intend to make `finalize()` block on outstanding/ongoing `getBuffer(...)` calls (or the other way around).
> > 
> > How do I make the documentation on the function clear about this?
> > 
> > > To enforce that invariant, the finalization flag must be set and read with mutex locked, so the flag itself doesn't have to be atomic (because the mutex provides the necessary acquire/release semantics).
> > 
> > On some platforms, operations on `std::mutex` may be implemented much more heavily compared to an atomic bool (hence the avoidance of locking the mutex in the first place). I'm actually thinking about whether a relaxed load is actually sufficient (at least in x86 it makes it faster) for the check, while using a release store on the `finalize()` side.
> > 
> > At this point I'm trying to avoid having to manually implement a wait-free or lock-free circular buffer, but maybe that's what this needs to end up becoming. ;)
> > 
> > Let me think about it a little more.
> Have you considered http://en.cppreference.com/w/cpp/atomic/atomic_flag ? There is an example on how to acquire and release it, so to get a "spin lock" you need just to spin more.
> I am not sure if "relaxed" will work, but you can avoid 2 acquire operations in a row (1 for `Finalizing`, 1 for the mutex) and instead just lock on the atomic flag, then check whether we are finalizing from a normal variable, then do the work and release the spin lock. It should not be spinning a lot because (if I understood correctly) `finalize` operation is rare and `getBuffer` is not called from multiple threads concurrently.
> Have you considered http://en.cppreference.com/w/cpp/atomic/atomic_flag ? There is an example on how to acquire and release it, so to get a "spin lock" you need just to spin more.

I have considered atomic_flag, but see no advantage to using that instead of `std::atomic<bool>`. I also don't need to spin-lock here, I just need to make sure that as soon as `finalize()` is done, all future `getBuffer()` calls will fail reliably.

> I am not sure if "relaxed" will work, but you can avoid 2 acquire operations in a row (1 for Finalizing, 1 for the mutex) and instead just lock on the atomic flag, then check whether we are finalizing from a normal variable, then do the work and release the spin lock. It should not be spinning a lot because (if I understood correctly) finalize operation is rare and getBuffer is not called from multiple threads concurrently.

`getBuffer()` is meant to be called in all threads that need a buffer (consider each thread to be writing to its own buffer) so this has to be synchronised appropriately. Now I could just use a spinlock, assuming that the `pop_front()` operation on a `std::deque<...>` will be fast enough, but pay dearly when the process is preempted in the middle of that critical section (i.e. while other threads are spinning).

I suppose I should add a set of tests to make sure we're fine here, let me do that next.

https://reviews.llvm.org/D26232