[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Jon Chesterfield via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Aug 6 09:11:45 PDT 2019


JonChesterfield added inline comments.


================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:19
+/// value of the pointee.
+template <typename T> T __kmpc_impl_atomic_add(T *Ptr, T Val) {
+  return atomicAdd(Ptr, Val);
----------------
jdoerfert wrote:
> ABataev wrote:
> > Why do we need these functions? IMHO, they do not add anything useful, just increase code complexity.
> They add a level of abstraction around the CUDA implementation, here the CUDA functions `atomicAdd` and `atomicExch`. In the AMD `target_impl.h` the `__kmpc_impl_atomic_exchange` would maybe be implemented with an explicit compare-exchange loop (I'm guessing here).
As it happens, atomicExch works fine for amdgcn. I'm not sure all the atomics are supported.

Closely related though, atomicMax doesn't appear to be available for  __CUDA_ARCH__ < 350, so on nvptx there's a function in loop.cu (__kmpc_reduce_conditional_lastprivate) which open codes a max in terms of atomicCAS. 

Given some variation in capability between cuda revisions, a compatibility shim that implements at least atomicMax seems a good idea. I'm not sure writing a compatibility layer for all of the possible atomics is within scope - such a thing may already exist elsewhere.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217





More information about the Openmp-commits mailing list