[PATCH] D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer
Brian Sumner via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 19 12:25:23 PDT 2021
b-sumner added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:299
+ B.CreateCall(UpdateDPP,
+ {Identity, V, B.getInt32(DPP::ROW_XMASK0 | 1 << Idx),
+ B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
----------------
foad wrote:
> foad wrote:
> > b-sumner wrote:
> > > This requires all lanes to be active. Are we guaranteed that the work group size will be a integer multiple of the wave size?
> > The reduction or scan runs in whole wave mode. All lanes are active.
> ... and lanes that weren't active to start with are set to an appropriate identity value for the operation.
But suppose the launched grid has size 66. That means one wave has only 2 active lanes, and I'm not aware that WWM can actually activate the rest of them.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98953/new/
https://reviews.llvm.org/D98953
More information about the llvm-commits
mailing list