[Openmp-commits] [PATCH] D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading
Gheorghe-Teodor Bercea via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Mar 15 16:12:08 PDT 2023
doru1004 added inline comments.
================
Comment at: openmp/libomptarget/DeviceRTL/src/Synchronization.cpp:287
+ while (!atomicCAS((uint32_t *)Lock, UNSET, SET, atomic::relaxed,
+ atomic::relaxed) != UNSET) {
+ __builtin_amdgcn_s_sleep(32);
----------------
jdoerfert wrote:
> jdoerfert wrote:
> > doru1004 wrote:
> > > jdoerfert wrote:
> > > > Shouldn't the release and acquire semantics be part of this CAS?
> > > > If we run relaxed who is to say we see the update of another thread. I would have assumed on failure we want an acquire fence and on success a release fence. Not sure if we need additional ones.
> > > The idea is that the thread only executes the acquire once when it enters the critical region to prevent it from being executed at every iteration of the loop (which comes with performance penalties if it happens).
> > >
> > > I would like to understand why you say there's a chance we don't see the update of another thread. The unsetting of the lock is happening here:
> > > ```
> > > void unsetLock(omp_lock_t *Lock) {
> > > (void)atomicExchange((uint32_t *)Lock, UNSET, atomic::acq_rel);
> > > }
> > > ```
> > The relaxed unset doesn't make other memory effects visible to a thread that takes the lock next. I believe it should come with a proper fence, hence be atomic::release.
> >
> > Similarly, I am not sure if the update of the lock itself is guaranteed to be observed if the update and check are relaxed.
> UnsetLock actually has a release fence, testLock does not have proper fencing, sorry.
I fixed testLock to use acq_rel.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D145831/new/
https://reviews.llvm.org/D145831
More information about the Openmp-commits
mailing list