[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading

Johannes Doerfert via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Mar 14 14:13:50 PDT 2023


jdoerfert added a comment.

Maybe we should define `omp_lock_t` as uint32_t on AMDGPU?



================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:275
+    return false;
+  setLock(Lock);
+  return true;
----------------
This doesn't work. You check the lock first, then you set it.
It has to be an atomic step. Use an atomicCAS, the result tells you if it worked/if it is now set. This is the setLock code w/o the while. You can even call this in the while below.


================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:283
+void setLock(omp_lock_t *Lock) {
+  uint64_t lowestActiveThread = utils::ffs(mapping::activemask()) - 1;
+  if (mapping::getThreadIdInWarp() == lowestActiveThread) {
----------------



================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:287
+    while (!atomicCAS((uint32_t *)Lock, UNSET, SET, atomic::relaxed,
+                      atomic::relaxed) != UNSET) {
+      __builtin_amdgcn_s_sleep(32);
----------------
Shouldn't the release and acquire semantics be part of this CAS?
If we run relaxed who is to say we see the update of another thread. I would have assumed on failure we want an acquire fence and on success a release fence. Not sure if we need additional ones.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145831/new/

https://reviews.llvm.org/D145831



More information about the Openmp-commits mailing list