[Openmp-commits] [openmp] r369796 - [OPENMP][NVPTX]Use __syncwarp() to reconverge the threads.
Alexey Bataev via Openmp-commits
openmp-commits at lists.llvm.org
Fri Aug 23 11:34:48 PDT 2019
Author: abataev
Date: Fri Aug 23 11:34:48 2019
New Revision: 369796
URL: http://llvm.org/viewvc/llvm-project?rev=369796&view=rev
Log:
[OPENMP][NVPTX]Use __syncwarp() to reconverge the threads.
Summary:
In Cuda 9.0 it is not guaranteed that threads in the warps are
convergent. We need to use __syncwarp() function to reconverge
the threads and to guarantee the memory ordering among threads in the
warps.
This is the first patch to fix the problem with the test
libomptarget/deviceRTLs/nvptx/src/sync.cu on Cuda9+.
This patch just replaces calls to __shfl_sync() function with the call
of __syncwarp() function where we need to reconverge the threads when we
try to modify the value of the parallel level counter.
Reviewers: grokos
Subscribers: guansong, jfb, jdoerfert, caomhin, kkwli0, openmp-commits
Tags: #openmp
Differential Revision: https://reviews.llvm.org/D65013
Modified:
openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h
Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h?rev=369796&r1=369795&r2=369796&view=diff
==============================================================================
--- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h (original)
+++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/omptarget-nvptx.h Fri Aug 23 11:34:48 2019
@@ -55,11 +55,14 @@
#define __SHFL_DOWN_SYNC(mask, var, delta, width) \
__shfl_down_sync((mask), (var), (delta), (width))
#define __ACTIVEMASK() __activemask()
+#define __SYNCWARP(Mask) __syncwarp(Mask)
#else
#define __SHFL_SYNC(mask, var, srcLane) __shfl((var), (srcLane))
#define __SHFL_DOWN_SYNC(mask, var, delta, width) \
__shfl_down((var), (delta), (width))
#define __ACTIVEMASK() __ballot(1)
+// In Cuda < 9.0 no need to sync threads in warps.
+#define __SYNCWARP(Mask)
#endif // CUDA_VERSION
#define __SYNCTHREADS_N(n) asm volatile("bar.sync %0;" : : "r"(n) : "memory");
Modified: openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h
URL: http://llvm.org/viewvc/llvm-project/openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h?rev=369796&r1=369795&r2=369796&view=diff
==============================================================================
--- openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h (original)
+++ openmp/trunk/libomptarget/deviceRTLs/nvptx/src/supporti.h Fri Aug 23 11:34:48 2019
@@ -202,25 +202,31 @@ INLINE int IsTeamMaster(int ompThreadId)
// Parallel level
INLINE void IncParallelLevel(bool ActiveParallel) {
- unsigned tnum = __ACTIVEMASK();
- int leader = __ffs(tnum) - 1;
- __SHFL_SYNC(tnum, leader, leader);
- if (GetLaneId() == leader) {
+ unsigned Active = __ACTIVEMASK();
+ __SYNCWARP(Active);
+ unsigned LaneMaskLt;
+ asm("mov.u32 %0, %%lanemask_lt;" : "=r"(LaneMaskLt));
+ unsigned Rank = __popc(Active & LaneMaskLt);
+ if (Rank == 0) {
parallelLevel[GetWarpId()] +=
(1 + (ActiveParallel ? OMP_ACTIVE_PARALLEL_LEVEL : 0));
+ __threadfence();
}
- __SHFL_SYNC(tnum, leader, leader);
+ __SYNCWARP(Active);
}
INLINE void DecParallelLevel(bool ActiveParallel) {
- unsigned tnum = __ACTIVEMASK();
- int leader = __ffs(tnum) - 1;
- __SHFL_SYNC(tnum, leader, leader);
- if (GetLaneId() == leader) {
+ unsigned Active = __ACTIVEMASK();
+ __SYNCWARP(Active);
+ unsigned LaneMaskLt;
+ asm("mov.u32 %0, %%lanemask_lt;" : "=r"(LaneMaskLt));
+ unsigned Rank = __popc(Active & LaneMaskLt);
+ if (Rank == 0) {
parallelLevel[GetWarpId()] -=
(1 + (ActiveParallel ? OMP_ACTIVE_PARALLEL_LEVEL : 0));
+ __threadfence();
}
- __SHFL_SYNC(tnum, leader, leader);
+ __SYNCWARP(Active);
}
////////////////////////////////////////////////////////////////////////////////
More information about the Openmp-commits
mailing list