[libcxx-commits] [PATCH] D117372: [libcxx] switch locale from using std::call_once to __libcpp_mutex_t
Daniel McIntosh via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Mar 16 18:45:41 PDT 2022
DanielMcIntosh-IBM added a comment.
Follow up on @EricWF's comments based on a video chat with him:
Since `id::__get` is apparently hit quite often, the extra performance we might get here because of the relaxed memory ordering instead of acquire/release ordering could potentially be significant. I do have to run some benchmarks to double check, but I would be shocked if this somehow hurt performance.
Comparing the old sequence to the new sequence side by side makes this pretty obvious I think.
The common/happy path:
old version using std::call_once | new version using __threading_support directly
call into std::call_once <= -
atomic_load(id::__flag_.__state_, _AO_Acquire) <= atomic_load(id::__id_, _AO_Relaxed)
- .. assign load result to local variable
compare load result NEQ ~0 == compare local_variable NEQ 0
//(False)
return from std::call_once <= -
return id::__id_ - 1 < return local_variable - 1
To say nothing of the uncommon path:
old version using std::call_once | new version using __threading_support directly
call into std::call_once <= -
atomic_load(id::__flag_.__state_, _AO_Acquire) <= atomic_load(id::__id_, _AO_Relaxed)
- .. assign load result to local_variable
compare load result NEQ ~0 == compare local_variable NEQ 0
{
junk wrapper code <= -
call into __call_once <= -
//(NO_THREADS):
compare (volatile) id::__flag_.__state_ EQ 0 <= compare local_variable EQ 0
{
start try block <= -
assign 1 to (volatile) id::__flag_.__state_ < -
junk wrapper code <= -
atomic_add(&__next_id, 1, _AO_Seq); <= increment __next_id
- .. assign __next_id to local_variable
assign __next_id to id::__id_ ?? atomic_store(id::__id_, local_variable, _AO_Relaxed)
assign ~0 to (volatile) id::__flag_.__state_ < -
catch block/end of try-block <= -
}
//(THREADS):
lock mutex common to all std::call_once calls <= lock mutex common to all id::__get calls (a subset of the previous calls to std::call_once)
- .. assign id::__id_ to local_variable
compare (volatile) id::__flag_.__state_ EQ 1 < -
condvar_wait < -
compare (volatile) id::__flag_.__state_ EQ 0 <= compare local_variable EQ 0
{
start try block <= -
atomic_store(id::__flag_.__state_, 1, _AO_Relaxed) < -
unlock mutex < -
junk wrapper code <= -
atomic_add(&__next_id, 1, _AO_Seq); <= increment __next_id
- .. assign __next_id to local_variable
assign __next_id to id::__id_ < -
re-lock mutex < -
atomic_store(id::__flag_.__state_, ~0, _AO_Relaxed) == atomic_store(id::__id_, local_variable, _AO_Relaxed)
unlock mutex == unlock mutex
condvar_broadcast < -
catch block/end of try-block <= -
}
else
unlock mutex == unlock mutex
}
return from std::call_once <= -
return id::__id_ - 1 < return local_variable - 1
Once this becomes more of a priority for us, I'll run some actual benchmarks to make sure before I consider landing this.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D117372/new/
https://reviews.llvm.org/D117372
More information about the libcxx-commits
mailing list