[Openmp-commits] [PATCH] D95294: [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Sat Jan 23 13:46:36 PST 2021

tianshilei1992 added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:142
 DEVICE uint32_t __kmpc_atomic_add(uint32_t *Address, uint32_t Val) {
-  return atomicAdd(Address, Val);
+  return __atomic_fetch_add(Address, Val, __ATOMIC_SEQ_CST);
 }
----------------
JonChesterfield wrote:
> tianshilei1992 wrote:
> > what about using NVVM atomic intrinsics directly? We don't need the memory order then.
> Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.
> 
> Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.
That sounds appealing. Maybe we could use this patch to move atomic operations back to common part, and create another patch to rewrite other CUDA intrinsics related functions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95294/new/

https://reviews.llvm.org/D95294