[Openmp-commits] [PATCH] D105699: [libomptarget][devicertl] Remove branches around setting parallelLevel
Jon Chesterfield via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jul 9 07:25:41 PDT 2021
JonChesterfield created this revision.
JonChesterfield added reviewers: jdoerfert, ABataev, grokos, tianshilei1992, ye-luo, ronlieb, carlo.bertolli, pdhaliwal, ggeorgakoudis, Meinersbur.
JonChesterfield requested review of this revision.
Herald added a project: OpenMP.
Herald added a subscriber: openmp-commits.
Simplifies control flow to allow store/load forwarding
Specifically, for spmd kernels with sufficiently aggressive inlining, this
change allows the loads from parallelLevel in parallel_51 and global_thead_num
to constant fold, removing most of the handling for potentially nested parallel
when there is none. At present, the parallelLevel array remains with a single
store to it.
Two transforms here:
int threadId = GetThreadIdInBlock();
if (threadId == 0) {
parallelLevel[0] = expr;
} else if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// =>
if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// because
unsigned GetLaneId() { return GetThreadIdInBlock() & (WARPSIZE - 1);}
// so whenever threadId == 0, GetLaneId() is also 0.
That replaces a store in two distinct basic blocks with as single store.
Second,
if (GetLaneId() == 0) {
parallelLevel[GetWarpId()] = expr;
}
// =>
parallelLevel[GetWarpId()] = expr;
// because
unsigned GetWarpId() { return GetThreadIdInBlock() / WARPSIZE; }
// so GetWarpId will index the same element for every thread in the warp
// and, because expr is lane-invariant in this case, every lane stores the
// same value to this unique address
The first transform is always an optimisation. The second is more debatable -
a GPU may use more power when every lane writes to a given address than when
it is masked off, but equally there is a cost to the masking and unmasking.
If the second transform is missed, the CFG for a SPMD kernel has two entry
points into the parallel_51 call (on LaneId == 0). With it included, the
calls into parallel_51 and global_thead_num are in the same basic block
as the writes to parallelLevel, and load/store forwarding removes the loads.
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D105699
Files:
openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
Index: openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
===================================================================
--- openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
+++ openmp/libomptarget/deviceRTLs/common/src/omptarget.cu
@@ -93,12 +93,11 @@
int threadId = GetThreadIdInBlock();
if (threadId == 0) {
usedSlotIdx = __kmpc_impl_smid() % MAX_SM;
- parallelLevel[0] =
- 1 + (GetNumberOfThreadsInBlock() > 1 ? OMP_ACTIVE_PARALLEL_LEVEL : 0);
- } else if (GetLaneId() == 0) {
- parallelLevel[GetWarpId()] =
- 1 + (GetNumberOfThreadsInBlock() > 1 ? OMP_ACTIVE_PARALLEL_LEVEL : 0);
}
+
+ parallelLevel[GetWarpId()] =
+ 1 + (GetNumberOfThreadsInBlock() > 1 ? OMP_ACTIVE_PARALLEL_LEVEL : 0);
+
__kmpc_data_sharing_init_stack();
if (!RequiresOMPRuntime) {
// Runtime is not required - exit.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D105699.357502.patch
Type: text/x-patch
Size: 858 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20210709/8dfa0df2/attachment.bin>
More information about the Openmp-commits
mailing list