[Openmp-commits] [PATCH] D95294: [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Sat Jan 23 15:32:59 PST 2021

JonChesterfield added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:142
 DEVICE uint32_t __kmpc_atomic_add(uint32_t *Address, uint32_t Val) {
-  return atomicAdd(Address, Val);
+  return __atomic_fetch_add(Address, Val, __ATOMIC_SEQ_CST);
 }
----------------
tianshilei1992 wrote:
> JonChesterfield wrote:
> > tianshilei1992 wrote:
> > > JonChesterfield wrote:
> > > > tianshilei1992 wrote:
> > > > > what about using NVVM atomic intrinsics directly? We don't need the memory order then.
> > > > Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.
> > > > 
> > > > Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.
> > > That sounds appealing. Maybe we could use this patch to move atomic operations back to common part, and create another patch to rewrite other CUDA intrinsics related functions.
> > I'd rather land this separately to starting to merge parts of the two target_impl files.
> > 
> > I'm not sure whether seq_cst is the right order for nvptx. Amdgcn should probably be device scope + acquire/release, though it's presently seq_cst as that's what hip did by default. So we may want slightly different Intrinsics again later.
> JFYI, I'm working on cleaning up code to remove `target_impl.h` now.
This is presently orthogonal to D95298 and removes about half of the remaining uses of the cuda sdk. I'd prefer land it independently of hashing out consensus on what the library structure should change to.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95294/new/

https://reviews.llvm.org/D95294