[PATCH] D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer
Jay Foad via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 19 14:02:24 PDT 2021
foad added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:299
+ B.CreateCall(UpdateDPP,
+ {Identity, V, B.getInt32(DPP::ROW_XMASK0 | 1 << Idx),
+ B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
----------------
b-sumner wrote:
> foad wrote:
> > foad wrote:
> > > b-sumner wrote:
> > > > This requires all lanes to be active. Are we guaranteed that the work group size will be a integer multiple of the wave size?
> > > The reduction or scan runs in whole wave mode. All lanes are active.
> > ... and lanes that weren't active to start with are set to an appropriate identity value for the operation.
> But suppose the launched grid has size 66. That means one wave has only 2 active lanes, and I'm not aware that WWM can actually activate the rest of them.
That's exactly what WWM does: unconditionally activates all lanes. You can see that in the tests in this patch (both before and after my changes): `s_or_saveexec_b64 s[0:1], -1` sets all bits in the exec mask.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98953/new/
https://reviews.llvm.org/D98953
More information about the llvm-commits
mailing list