[Openmp-commits] [openmp] b6b53ff - [libomptarget][devicertl] Remove branches around setting parallelLevel

Jon Chesterfield via Openmp-commits openmp-commits at lists.llvm.org
Tue Jul 13 04:07:24 PDT 2021


Author: Jon Chesterfield
Date: 2021-07-13T12:06:57+01:00
New Revision: b6b53ffef4414ed62701a63ad28e70cfd9d26191

URL: https://github.com/llvm/llvm-project/commit/b6b53ffef4414ed62701a63ad28e70cfd9d26191
DIFF: https://github.com/llvm/llvm-project/commit/b6b53ffef4414ed62701a63ad28e70cfd9d26191.diff

LOG: [libomptarget][devicertl] Remove branches around setting parallelLevel

Simplifies control flow to allow store/load forwarding

This change folds two basic blocks into one, leaving a single store to parallelLevel.
This is a step towards spmd kernels with sufficiently aggressive inlining folding
the loads from parallelLevel and thus discarding the nested parallel handling
when it is unused.

Transform:
```
int threadId = GetThreadIdInBlock();
if (threadId == 0) {
  parallelLevel[0] = expr;
} else if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// =>
if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// because
unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);}
// so whenever threadId == 0, GetLaneId() is also 0.
```

That replaces a store in two distinct basic blocks with as single store.

A more aggressive follow up is possible if the threads in the warp/wave
race to write the same value to the same address. This is not done as
part of this change.

```
if (GetLaneId() == 0) {
  parallelLevel[GetWarpId()] = expr;
}
// =>
parallelLevel[GetWarpId()] = expr;
// because
unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
// so GetWarpId will index the same element for every thread in the warp
// and, because expr is lane-invariant in this case, every lane stores the
// same value to this unique address
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D105699

Added: 
    

Modified: 
    openmp/libomptarget/deviceRTLs/common/src/omptarget.cu

Removed: 
    


################################################################################
diff  --git a/openmp/libomptarget/deviceRTLs/common/src/omptarget.cu b/openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
index 153754fc3fdda..21608549edf1b 100644
--- a/openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
+++ b/openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
@@ -90,12 +90,13 @@ static void __kmpc_spmd_kernel_init(bool RequiresFullRuntime) {
   int threadId = GetThreadIdInBlock();
   if (threadId == 0) {
     usedSlotIdx = __kmpc_impl_smid() % MAX_SM;
-    parallelLevel[0] =
-        1 + (GetNumberOfThreadsInBlock() > 1 ? OMP_ACTIVE_PARALLEL_LEVEL : 0);
-  } else if (GetLaneId() == 0) {
+  }
+
+  if (GetLaneId() == 0) {
     parallelLevel[GetWarpId()] =
         1 + (GetNumberOfThreadsInBlock() > 1 ? OMP_ACTIVE_PARALLEL_LEVEL : 0);
   }
+
   __kmpc_data_sharing_init_stack();
   if (!RequiresFullRuntime)
     return;


        


More information about the Openmp-commits mailing list