[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri May 24 07:47:49 PDT 2019

ABataev created this revision.
ABataev added reviewers: grokos, kkwli0, gtbercea.
Herald added subscribers: jdoerfert, jfb, guansong.
Herald added a project: OpenMP.

Parallel level counter is simultaneously by many threads. In combination
with atomic operations used in full runtime SPMD mode (fired by the
dynamic scheduling, for example) it may lead to incorrect results caused
by the compiler optimizations.
According to CUDA Toolkit documentation (https://docs.nvidia.com/cuda/archive/8.0/cuda-c-programming-guide/index.html#volatile-qualifier)
The compiler is free to optimize reads and writes to global or shared
memory (for example, by caching global reads into registers or L1 <https://reviews.llvm.org/L1> cache)
as long as it respects the memory ordering semantics of memory fence
functions and memory visibility semantics of synchronization functions.

These optimizations can be disabled using the volatile keyword: If a
variable located in global or shared memory is declared as volatile, the
compiler assumes that its value can be changed or used at any time by
another thread and therefore any reference to this variable compiles to
an actual memory read or write instruction.

This especially important in case of thread divergence mixed with atomic
operations. In our case, this may lead to undefined behavior|threads
deadlock, especially for CUDA 9 and later. This change is required
especially for SPMD mode with full runtime or SPMD mode without full runtime mode but with atomic operations.

  rOMP OpenMP



Index: libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp
--- libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp
+++ libomptarget/deviceRTLs/nvptx/test/parallel/spmd_parallel_regions.cpp
@@ -47,5 +47,15 @@
   // CHECK: Expected count = 86
   printf("Expected count = %d\n", Count);
+  Count = 0;
+#pragma omp target parallel for reduction(+: Count) schedule(dynamic, 2) num_threads(64)
+  for (int J = 0; J < 1000; ++J) {
+    Count += J;
+  }
+  // Final result of Count is 1000 * (999-0) / 2
+  // CHECK: Expected count with dynamic scheduling = 499500
+  printf("Expected count with dynamic scheduling = %d\n", Count);
   return isHost;
Index: libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
--- libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
+++ libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
@@ -398,7 +398,7 @@
 extern __device__ __shared__ uint32_t usedMemIdx;
 extern __device__ __shared__ uint32_t usedSlotIdx;
-extern __device__ __shared__ uint8_t
+extern __device__ __shared__ volatile uint8_t
     parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];
 extern __device__ __shared__ uint16_t threadLimit;
 extern __device__ __shared__ uint16_t threadsInTeam;
Index: libomptarget/deviceRTLs/nvptx/src/omp_data.cu
--- libomptarget/deviceRTLs/nvptx/src/omp_data.cu
+++ libomptarget/deviceRTLs/nvptx/src/omp_data.cu
@@ -31,7 +31,8 @@
 __device__ __shared__ uint32_t usedMemIdx;
 __device__ __shared__ uint32_t usedSlotIdx;
-__device__ __shared__ uint8_t parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];
+__device__ __shared__ volatile uint8_t
+    parallelLevel[MAX_THREADS_PER_TEAM / WARPSIZE];
 __device__ __shared__ uint16_t threadLimit;
 __device__ __shared__ uint16_t threadsInTeam;
 __device__ __shared__ uint16_t nThreads;

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D62393.201245.patch
Type: text/x-patch
Size: 2039 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20190524/3a1e0fff/attachment.bin>

More information about the Openmp-commits mailing list