[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Aug 6 09:16:43 PDT 2019
ABataev added inline comments.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:19
+/// value of the pointee.
+template <typename T> T __kmpc_impl_atomic_add(T *Ptr, T Val) {
+ return atomicAdd(Ptr, Val);
----------------
JonChesterfield wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > Why do we need these functions? IMHO, they do not add anything useful, just increase code complexity.
> > They add a level of abstraction around the CUDA implementation, here the CUDA functions `atomicAdd` and `atomicExch`. In the AMD `target_impl.h` the `__kmpc_impl_atomic_exchange` would maybe be implemented with an explicit compare-exchange loop (I'm guessing here).
> As it happens, atomicExch works fine for amdgcn. I'm not sure all the atomics are supported.
>
> Closely related though, atomicMax doesn't appear to be available for __CUDA_ARCH__ < 350, so on nvptx there's a function in loop.cu (__kmpc_reduce_conditional_lastprivate) which open codes a max in terms of atomicCAS.
>
> Given some variation in capability between cuda revisions, a compatibility shim that implements at least atomicMax seems a good idea. I'm not sure writing a compatibility layer for all of the possible atomics is within scope - such a thing may already exist elsewhere.
It works for AMD but may not work for some other platforms. Better to make it target-dependent anyway.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64217/new/
https://reviews.llvm.org/D64217
More information about the Openmp-commits
mailing list