[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading

Thu Mar 16 10:16:39 PDT 2023

dhruvachak added inline comments.

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:269
+void unsetLock(omp_lock_t *Lock) {
+  (void)atomicExchange((uint32_t *)Lock, UNSET, atomic::acq_rel);
+}
----------------
In theory, unset could simply do an atomic write to the Lock with release memorder. Any reason that's not used here? You are not using the prev value anyways.

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:284
+    fenceKernel(atomic::release);
+    while (!atomicCAS((uint32_t *)Lock, UNSET, SET, atomic::relaxed,
+                      atomic::relaxed) != UNSET) {
----------------
I am getting confused by this pattern. You have:

while (!atomicCAS(...) != UNSET)

Can't the above be written as 

while (atomicCAS(...) == UNSET)

But what does atomicCAS return? Bool or the prev value? If it is the prev value, shouldn't the condition be?

while (atomicCAS(...) == SET) { sleep(); }

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:285
+    while (!atomicCAS((uint32_t *)Lock, UNSET, SET, atomic::relaxed,
+                      atomic::relaxed) != UNSET) {
+      __builtin_amdgcn_s_sleep(32);
----------------
I think the CAS should take an acq_rel as the success memorder and an acq as the failure memorder. 

The acq in both the cases will ensure it sees an update from another thread. The release from the other updating thread is not sufficient for this thread to see that update. And when this thread succeeds, the release will ensure other threads see the update of this thread. 

With the above memorders, I would think we could get rid of the fences.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145831/new/

https://reviews.llvm.org/D145831