[Openmp-commits] [PATCH] D95294: [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Sat Jan 23 14:40:19 PST 2021

tianshilei1992 added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:142
 DEVICE uint32_t __kmpc_atomic_add(uint32_t *Address, uint32_t Val) {
-  return atomicAdd(Address, Val);
+  return __atomic_fetch_add(Address, Val, __ATOMIC_SEQ_CST);
 }
----------------
JonChesterfield wrote:
> tianshilei1992 wrote:
> > JonChesterfield wrote:
> > > tianshilei1992 wrote:
> > > > what about using NVVM atomic intrinsics directly? We don't need the memory order then.
> > > Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.
> > > 
> > > Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.
> > That sounds appealing. Maybe we could use this patch to move atomic operations back to common part, and create another patch to rewrite other CUDA intrinsics related functions.
> I'd rather land this separately to starting to merge parts of the two target_impl files.
> 
> I'm not sure whether seq_cst is the right order for nvptx. Amdgcn should probably be device scope + acquire/release, though it's presently seq_cst as that's what hip did by default. So we may want slightly different Intrinsics again later.
JFYI, I'm working on cleaning up code to remove `target_impl.h` now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95294/new/

https://reviews.llvm.org/D95294