[Openmp-commits] [PATCH] D101976: [OpenMP] Unified entry point for SPMD & generic kernels in the device RTL

Thu May 6 07:51:28 PDT 2021

jdoerfert added inline comments.

================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.cu:65-68
+  asm volatile("barrier.sync %0;"
+               :
+               : "r"(barrier)
+               : "memory");
----------------
ABataev wrote:
> Why not `__syncthreads`? It is safer to use `__syncthreads` as it is `convergent`. Would be good to mark this code somehow as `convergent` too to avoid incorrect optimizations
The problem is that syncthreads is basically a `bar.sync` which is a `barrier.sync.aligned`, if I understood everything properly. This worked so far because the "main thread" (lane 0, last warp) was alone in it's warp and all other threads have been terminated. Now, we simplify the control flow (and later get rid of the last warp) such that the threads of the last warp and the main thread will hit different barriers. The former hit the one in the state machine while the latter will be in `parallel_51`. The `.aligned` version doesn't allow that. Does that make sense?

I'm not concerned about convergent though, we solved that wholesale: We mark all functions that clang compiles for the GPU via openmp-target as convergent (IIRC). The entire device runtime is certainly convergent.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101976/new/

https://reviews.llvm.org/D101976