[libcxx-commits] [PATCH] D117372: [libcxx] switch locale from using std::call_once to __libcpp_mutex_t

Wed Mar 16 18:45:41 PDT 2022

DanielMcIntosh-IBM added a comment.

Follow up on @EricWF's comments based on a video chat with him:
Since `id::__get` is apparently hit quite often, the extra performance we might get here because of the relaxed memory ordering instead of acquire/release ordering could potentially be significant. I do have to run some benchmarks to double check, but I would be shocked if this somehow hurt performance.
Comparing the old sequence to the new sequence side by side makes this pretty obvious I think.

The common/happy path:

  old version using std::call_once                                    |   new version using __threading_support directly

  call into std::call_once                                            <=  -
  atomic_load(id::__flag_.__state_, _AO_Acquire)                      <=  atomic_load(id::__id_, _AO_Relaxed)
  -                                                                   ..  assign load result to local variable
  compare load result NEQ ~0                                          ==  compare local_variable NEQ 0
  //(False)
  return from std::call_once                                          <=  -
  return id::__id_ - 1                                                <   return local_variable - 1

To say nothing of the uncommon path:

  old version using std::call_once                                    |   new version using __threading_support directly

  call into std::call_once                                            <=  -
  atomic_load(id::__flag_.__state_, _AO_Acquire)                      <=  atomic_load(id::__id_, _AO_Relaxed)
  -                                                                   ..  assign load result to local_variable
  compare load result NEQ ~0                                          ==  compare local_variable NEQ 0
  {
      junk wrapper code                                               <=  -
      call into __call_once                                           <=  -
  //(NO_THREADS):
      compare (volatile) id::__flag_.__state_ EQ 0                    <=  compare local_variable EQ 0
      {
          start try block                                             <=  -
          assign 1 to (volatile) id::__flag_.__state_                 <   -
          junk wrapper code                                           <=  -
          atomic_add(&__next_id, 1, _AO_Seq);                         <=  increment __next_id
          -                                                           ..  assign __next_id to local_variable
          assign __next_id to id::__id_                               ??  atomic_store(id::__id_, local_variable, _AO_Relaxed)
          assign ~0 to (volatile) id::__flag_.__state_                <   -
          catch block/end of try-block                                <=  -
      }
  //(THREADS):
      lock mutex common to all std::call_once calls                   <=  lock mutex common to all id::__get calls (a subset of the previous calls to std::call_once)
      -                                                               ..  assign id::__id_ to local_variable
      compare (volatile) id::__flag_.__state_ EQ 1                    <   -
          condvar_wait                                                <   -
      compare (volatile) id::__flag_.__state_ EQ 0                    <=  compare local_variable EQ 0
      {
          start try block                                             <=  -
          atomic_store(id::__flag_.__state_, 1, _AO_Relaxed)          <   -
          unlock mutex                                                <   -
          junk wrapper code                                           <=  -
          atomic_add(&__next_id, 1, _AO_Seq);                         <=  increment __next_id
          -                                                           ..  assign __next_id to local_variable
          assign __next_id to id::__id_                               <   -
          re-lock mutex                                               <   -
          atomic_store(id::__flag_.__state_, ~0, _AO_Relaxed)         ==  atomic_store(id::__id_, local_variable, _AO_Relaxed)
          unlock mutex                                                ==  unlock mutex
          condvar_broadcast                                           <   -
          catch block/end of try-block                                <=  -
      }
      else
          unlock mutex                                                ==  unlock mutex
  }
  return from std::call_once                                          <=  -
  return id::__id_ - 1                                                <   return local_variable - 1

Once this becomes more of a priority for us, I'll run some actual benchmarks to make sure before I consider landing this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117372/new/

https://reviews.llvm.org/D117372