[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Tue Aug 6 09:16:43 PDT 2019

ABataev added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:19
+/// value of the pointee.
+template <typename T> T __kmpc_impl_atomic_add(T *Ptr, T Val) {
+  return atomicAdd(Ptr, Val);
----------------
JonChesterfield wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > Why do we need these functions? IMHO, they do not add anything useful, just increase code complexity.
> > They add a level of abstraction around the CUDA implementation, here the CUDA functions `atomicAdd` and `atomicExch`. In the AMD `target_impl.h` the `__kmpc_impl_atomic_exchange` would maybe be implemented with an explicit compare-exchange loop (I'm guessing here).
> As it happens, atomicExch works fine for amdgcn. I'm not sure all the atomics are supported.
> 
> Closely related though, atomicMax doesn't appear to be available for  __CUDA_ARCH__ < 350, so on nvptx there's a function in loop.cu (__kmpc_reduce_conditional_lastprivate) which open codes a max in terms of atomicCAS. 
> 
> Given some variation in capability between cuda revisions, a compatibility shim that implements at least atomicMax seems a good idea. I'm not sure writing a compatibility layer for all of the possible atomics is within scope - such a thing may already exist elsewhere.
It works for AMD but may not work for some other platforms. Better to make it target-dependent anyway.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217