[Openmp-commits] [PATCH] D95294: [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sat Jan 23 13:30:25 PST 2021


JonChesterfield added inline comments.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:30
 int __shfl_down_sync(unsigned mask, int var, unsigned detla, int width);
 void __syncwarp(int mask);
 }
----------------
tianshilei1992 wrote:
> `__syncwarp` is left. It can be also simply replaced by `__nvvm_bar_warp_sync(mask)`.
Syncwarp is one of the cuda_version intrinsics, will handle it with the rest of them.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:142
 DEVICE uint32_t __kmpc_atomic_add(uint32_t *Address, uint32_t Val) {
-  return atomicAdd(Address, Val);
+  return __atomic_fetch_add(Address, Val, __ATOMIC_SEQ_CST);
 }
----------------
tianshilei1992 wrote:
> what about using NVVM atomic intrinsics directly? We don't need the memory order then.
Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.

Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95294/new/

https://reviews.llvm.org/D95294



More information about the Openmp-commits mailing list