[cfe-dev] [libcxx] optimizing shared_ptr atomics in destructors

Tue Jul 19 08:12:40 PDT 2016

Relevant review:

https://reviews.llvm.org/D22470

I just updated it with an extra benchmark.  The performance of weak_ptr 
decrements on x86 did get significantly worse, but I think that's a fair 
trade off.  For every weak_ptr in the wild, there's likely ten 
shared_ptrs that should be unique_ptrs that will benefit from this change.

On 7/18/2016 12:29 PM, JF Bastien wrote:
> On Mon, Jul 18, 2016 at 8:31 AM, Craig, Ben <ben.craig at codeaurora.org 
> <mailto:ben.craig at codeaurora.org>> wrote:
>
>     Currently, when the last shared_ptr to an object is destroyed,
>     libc++ performs two atomic decrements, one for the "strong" shared
>     count, and one for the "weak" shared count.  I think we can do
>     better than this in the uncontended case, but I would like some
>     feedback for this optimization, particularly on the ARM side.
>
>     Here's the code change...
>     diff --git a/src/memory.cpp b/src/memory.cpp
>     index 08f2259..b459eb1 100644
>     --- a/src/memory.cpp
>     +++ b/src/memory.cpp
>     @@ -30,12 +30,12 @@ increment(T& t) _NOEXCEPT
>          return __libcpp_atomic_add(&t, 1, _AO_Relaxed);
>      }
>
>      template <class T>
>      inline T
>      decrement(T& t) _NOEXCEPT
>      {
>          return __libcpp_atomic_add(&t, -1, _AO_Acq_Rel);
>      }
>
>      }  // namespace
>     @@ -96,7 +96,9 @@ __shared_weak_count::__release_shared() _NOEXCEPT
>      void
>      __shared_weak_count::__release_weak() _NOEXCEPT
>      {
>     -    if (decrement(__shared_weak_owners_) == -1)
>     +    if (__libcpp_atomic_load(&__shared_weak_owners_, _AO_Acquire)
>     == 0)
>     +        __on_zero_shared_weak();
>     +    else if (decrement(__shared_weak_owners_) == -1)
>              __on_zero_shared_weak();
>      }
>
>     The general idea is that if the current thread is destroying the
>     last weak reference, then no other thread can legally be accessing
>     this object.  Given that, we can avoid an expensive atomic store.
>     On x86_64, a quick-and-dirty benchmark is showing an 8%
>     improvement in performance for the combination of make_shared<int>
>     and the accompanying destruction.  I don't have performance
>     numbers for other architectures at this point.  That 8% is pretty
>     promising though, as the atomic operation improvements are showing
>     through, despite being measured along with a heap allocation and
>     deallocation.
>
>
> Do you have a repo with this benchmark?
>
>
>     Note that this optimization wouldn't be safe for the strong count,
>     as the last strong count decrement can still contend with a
>     weak_ptr::lock() call.
>
>     This comes at the cost of adding an extra load acquire for all but
>     the last decrement (and sometimes even the last decrement).  On
>     x86, this is really cheap (just a regular mov).  Currently with
>     aarch64 and 32-bit armv8, you get an extra lda, and with armv7 you
>     get extra barriers.
>
>     I would hope / expect that on LL/SC architectures, the first
>     acquire load could be folded with the locked load in the atomic
>     add.  The check and branch (inside the ll / sc loop) would then be
>     the only overhead.  Is this a reasonable optimization to hope for
>     in the future on the compiler front?
>
>
> What do you mean exactly, could you provide assembly? I think I 
> understand (sounds clever & doable), but assembly is easier :-)
> That can be benchmarked as well.
>
>
>     Also, I'm being a bit conservative here by making my atomic load
>     an acquire operation.  It might be safe to make the operation
>     relaxed, but that seems risky to me, as __on_zero_shared_weak may
>     end up touching unsynchronized data in those cases.
>
>
> I haven't thought enough about shared_ptr to convince myself either 
> way. Would be good to benchmark to see if it's even worth proving.

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160719/98751871/attachment.html>