[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Mar 23 17:16:56 PDT 2023
JonChesterfield added a comment.
I'm not confident that an acq_rel exchange in combination with the fences is correct. If it is correct, I think it must be suboptimal. Why have you gone with that implementation?
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:279
+ (void)atomicExchange((uint32_t *)Lock, UNSET, atomic::acq_rel);
+}
+
----------------
I expected unset to be a relaxed store of unset, possibly surrounded by fences. I'm not totally confident an acq_rel ordering here synchronises with the fences in setCriticalLock, though it might do.
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:290
+ fenceKernel(atomic::aquire);
+ }
+}
----------------
The fences probably need to be outside the branch on active thread, assuming the intent is for the lock to cover all the threads in the warp. Though I'm not sure it'll make any difference to codegen.
I think relaxed ordering on both is right here.
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:473
}
+void unsetCriticalLock(omp_lock_t *Lock) {
----------------
These read like a copy/paste error - most functions nearby are calling impl:: with a similar name. Could you add a comment to the effect that nvptx uses the same lock implementation for critical sections as for other locks?
================
Comment at: openmp/libomptarget/test/offloading/target_critical_region.cpp:5
+// UNSUPPORTED: nvptx64-nvidia-cuda-LTO
+// UNSUPPORTED: x86_64-pc-linux-gnu
+// UNSUPPORTED: x86_64-pc-linux-gnu-LTO
----------------
why not nvptx? also why not x64, but that seems less immediately worrying
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145831/new/
https://reviews.llvm.org/D145831
More information about the Openmp-commits
mailing list