[libcxx-commits] [PATCH] D68480: Implementation of C++20's P1135R6 for libcxx

Olivier Giroux via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Thu Jan 9 14:00:30 PST 2020


__simt__ marked 24 inline comments as done.
__simt__ added a comment.

No problem. I just want to know what I need to do in the next patch in order to move forward.



================
Comment at: libcxx/src/atomic.cpp:28
+
+struct alignas(64) __libcpp_contention_t
+{
----------------
zoecarver wrote:
> For what platforms is `_LIBCPP_HAS_PLATFORM_WAIT_STATE` false, and have you tested on those platforms? I'm worried that there might be compiler errors. 
It's false on CUDA. It would be false on platforms that can't rely on OS support for efficient waiting and have to fall back to polling with backoff.


================
Comment at: libcxx/src/atomic.cpp:32
+    static constexpr __libcpp_platform_contention_t __count_bit = 2;
+    __cxx_atomic_impl<__libcpp_platform_contention_t> __s = ATOMIC_VAR_INIT(0);
+};
----------------
zoecarver wrote:
> What's the point of this macro, `ATOMIC_VAR_INIT` (I realize you didn't add it, but I'm still curious)?
Prior to P0883 merging into C++20, atomics are constructed in an uninitialized state. You're supposed to use this macro to give it a static initialization. This macro is deprecated after C++20.


================
Comment at: libcxx/src/atomic.cpp:46
+
+static constexpr size_t __libcpp_contention_state_size = (1 << 8);  /* < there's no magic in this number */
+__libcpp_contention_t __libcpp_contention_state_[ __libcpp_contention_state_size ];
----------------
zoecarver wrote:
> This is different if `_LIBCPP_HAS_PLATFORM_WAIT_STATE` is false, right?
If you have a platform wait state, then it's used by this facility.

If you don't have a platform wait state, then a condvar is used instead.



================
Comment at: libcxx/src/atomic.cpp:129
+    auto const __wait_bit = __libcpp_contention_t::__wait_bit;
+    if(__wait_bit & __cxx_atomic_fetch_and(__s, ~__wait_bit, memory_order_relaxed))
+        __libcpp_platform_wake(__s, __all);
----------------
zoecarver wrote:
> Will this ever be true? 
Oh yes. Every time that there is a contending waiter concurrent with this notifier. This condition is true whenever the facility's use is not trivial.


================
Comment at: libcxx/src/atomic.cpp:133
+_LIBCPP_FUNC_VIS
+void __libcpp_platform_contention_monitor_for_wait(void const volatile* __location, 
+                                                   void* __contention_state,
----------------
zoecarver wrote:
> same as below
Answered below.


================
Comment at: libcxx/src/atomic.cpp:139
+    auto* const __ret = (__libcpp_platform_contention_t*)__old_value;
+#ifndef _LIBCPP_HAS_NO_PLATFORM_WAIT_TABLE
+    if(nullptr == __s)
----------------
zoecarver wrote:
> Is there a test for this case? What will be the effect of `__s` not getting updated? 
Undefined behavior. Usually a segfault.


================
Comment at: libcxx/src/atomic.cpp:148
+_LIBCPP_FUNC_VIS
+void __libcpp_platform_contention_wait_and_cleanup(void const volatile* __location, 
+                                                   bool __cond,
----------------
zoecarver wrote:
> Instead of having void pointers that are casted, I don't see any reason these couldn't be defined as their actual types (`__cxx_atomic_impl*` and `__libcpp_platform_contention_t*`).
This is the uglier part of the patch. We have this situation where one platform (Apple) is about to go from not having platform wait states, to having them. This API is trying to be able to take either kind of efficient wait structure in the application OR in the dylib, and provide efficient waiting either way.

I think it would not be unreasonable if you guys asked me to cut this part out in order to get a first commit of the facility, and then work in the background with Louis on recovering the capability in some way, with or without a public patch.


================
Comment at: libcxx/src/atomic.cpp:162
+    static constexpr timespec __timeout = { 2, 0 };
+    __libcpp_platform_wait(__s, *__old, sizeof(__libcpp_platform_contention_t) > 4 ? nullptr : &__timeout);
+}
----------------
zoecarver wrote:
> Is `__libcpp_platform_wait` defined on non-linux machines?
Yes.


================
Comment at: libcxx/src/barrier.cpp:44
+#ifndef _LIBCPP_HAS_NO_THREAD_FAVORITE_BARRIER_INDEX
+            if(0 == __round) {
+                if(__current >= __current_expected)
----------------
zoecarver wrote:
> Will this ever not //only// happen on the first iteration (if so, move it out of the loop maybe)?
This needs to be inside the loop unfortunately. During the first round, we need to record our effective start (leaf) location in the tree. I could express it differently - record it only at the end of the first round - but it would remain in the loop nest.

What we could potentially do is dumb down this favorite barrier index concept, or remove it entirely. It's worth a relatively small amount of performance by comparison to using the tree barrier in the first place.

I would not be offended if you asked me to give you an ordered list of things I could delete and tell you what it costs you to delete them, approximately. Then we could stop where we are more comfortable.


================
Comment at: libcxx/src/semaphore.cpp:27
+__sem_semaphore_basic_base::~__sem_semaphore_basic_base() {
+#ifdef __APPLE__
+    auto __b = __balance.load(memory_order_relaxed);
----------------
zoecarver wrote:
> Why is this only needed for apple?
Because of how GCD semaphores work, unfortunately.

We could delete this by sending all Apple semaphores to the generic template based on atomics.

Last I spoke with Louis, we thought that would be acceptable.


================
Comment at: libcxx/src/semaphore.cpp:123
+        return 0 != (__old >> 32);
+    }, chrono::microseconds(50));
+    while(1) {
----------------
zoecarver wrote:
> Where does `50` come from? Maybe make this a macro. 
Yeah. Aren't there enough macros though? I think I might come back later with a different patch to propose a set of macros to configure back-offs.


================
Comment at: libcxx/src/semaphore.cpp:152
+    uint64_t __old = 0;
+    while(0 == (__old & ~0ul))
+        if(__frontbuffer.compare_exchange_weak(__old, __old + (uint64_t(__update) << 32), memory_order_acq_rel))
----------------
zoecarver wrote:
> Is this the same as `while (__old == 0)`?
It's not. This is masking the lower 32-bits of a 64-bit value, and then comparing that with 0.


Repository:
  rCXX libc++

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68480/new/

https://reviews.llvm.org/D68480





More information about the libcxx-commits mailing list