[Openmp-commits] [PATCH] D95294: [libomptarget][nvptx] Replace cuda atomic primitives with clang intrinsics
Shilei Tian via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Sat Jan 23 13:46:36 PST 2021
tianshilei1992 added inline comments.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:142
DEVICE uint32_t __kmpc_atomic_add(uint32_t *Address, uint32_t Val) {
- return atomicAdd(Address, Val);
+ return __atomic_fetch_add(Address, Val, __ATOMIC_SEQ_CST);
}
----------------
JonChesterfield wrote:
> tianshilei1992 wrote:
> > what about using NVVM atomic intrinsics directly? We don't need the memory order then.
> Exposing memory order is a feature. Makes it clear we're using the slow one, gives the hook to change that if we wish.
>
> Also gives an option to use the same clang intrinsics on amdgpu and nvptx if we wish.
That sounds appealing. Maybe we could use this patch to move atomic operations back to common part, and create another patch to rewrite other CUDA intrinsics related functions.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D95294/new/
https://reviews.llvm.org/D95294
More information about the Openmp-commits
mailing list