[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Mar 14 14:13:50 PDT 2023
jdoerfert added a comment.
Maybe we should define `omp_lock_t` as uint32_t on AMDGPU?
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:275
+ return false;
+ setLock(Lock);
+ return true;
----------------
This doesn't work. You check the lock first, then you set it.
It has to be an atomic step. Use an atomicCAS, the result tells you if it worked/if it is now set. This is the setLock code w/o the while. You can even call this in the while below.
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:283
+void setLock(omp_lock_t *Lock) {
+ uint64_t lowestActiveThread = utils::ffs(mapping::activemask()) - 1;
+ if (mapping::getThreadIdInWarp() == lowestActiveThread) {
----------------
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:287
+ while (!atomicCAS((uint32_t *)Lock, UNSET, SET, atomic::relaxed,
+ atomic::relaxed) != UNSET) {
+ __builtin_amdgcn_s_sleep(32);
----------------
Shouldn't the release and acquire semantics be part of this CAS?
If we run relaxed who is to say we see the update of another thread. I would have assumed on failure we want an acquire fence and on success a release fence. Not sure if we need additional ones.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145831/new/
https://reviews.llvm.org/D145831
More information about the Openmp-commits
mailing list