[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading

Thu Mar 23 17:16:56 PDT 2023

JonChesterfield added a comment.

I'm not confident that an acq_rel exchange in combination with the fences is correct. If it is correct, I think it must be suboptimal. Why have you gone with that implementation?

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:279
+  (void)atomicExchange((uint32_t *)Lock, UNSET, atomic::acq_rel);
+}
+
----------------
I expected unset to be a relaxed store of unset, possibly surrounded by fences. I'm not totally confident an acq_rel ordering here synchronises with the fences in setCriticalLock, though it might do.

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:290
+    fenceKernel(atomic::aquire);
+  }
+}
----------------
The fences probably need to be outside the branch on active thread, assuming the intent is for the lock to cover all the threads in the warp. Though I'm not sure it'll make any difference to codegen.

I think relaxed ordering on both is right here.

================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:473
 }

+void unsetCriticalLock(omp_lock_t *Lock) {
----------------
These read like a copy/paste error - most functions nearby are calling impl:: with a similar name. Could you add a comment to the effect that nvptx uses the same lock implementation for critical sections as for other locks?

================
Comment at: openmp/libomptarget/test/offloading/target_critical_region.cpp:5
+// UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
----------------
why not nvptx? also why not x64, but that seems less immediately worrying

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145831/new/

https://reviews.llvm.org/D145831