[Openmp-commits] [PATCH] D113602: [OpenMP] Fix master thread barrier for Pascal and amdgpu
Johannes Doerfert via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Nov 11 17:40:21 PST 2021
jdoerfert added inline comments.
================
Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:3463
+ // BlockSize = BlockHwSize - WarpSize;
+ // bool IsWorker = InitCB >= 0 && InitCB < BlockSize;
// if (IsWorker) {
----------------
This doesn't quite work because now every thread in the last warp will execute the user code.
I think the minimal addition is
```
if (InitCB <u BlockSize)
return;
```
and then whatever we had before.
================
Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:3547
+ IsWorker->setDebugLoc(DLoc);
+ BranchInst::Create(StateMachineBeginBB, UserCodeEntryBB, IsWorker, InitBB);
----------------
-master +main
================
Comment at: openmp/libomptarget/DeviceRTL/src/Kernel.cpp:103
+ if (UseGenericStateMachine &&
+ mapping::getThreadIdInBlock() < mapping::getBlockSize())
genericStateMachine(Ident);
----------------
-master +main
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D113602/new/
https://reviews.llvm.org/D113602
More information about the Openmp-commits
mailing list