[PATCH] D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer

Fri Mar 19 14:02:24 PDT 2021

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:299
+        B.CreateCall(UpdateDPP,
+                     {Identity, V, B.getInt32(DPP::ROW_XMASK0 | 1 << Idx),
+                      B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
----------------
b-sumner wrote:
> foad wrote:
> > foad wrote:
> > > b-sumner wrote:
> > > > This requires all lanes to be active.  Are we guaranteed that the work group size will be a integer multiple of the wave size?
> > > The reduction or scan runs in whole wave mode. All lanes are active.
> > ... and lanes that weren't active to start with are set to an appropriate identity value for the operation.
> But suppose the launched grid has size 66. That means one wave has only 2 active lanes, and I'm not aware that WWM can actually activate the rest of them.
That's exactly what WWM does: unconditionally activates all lanes. You can see that in the tests in this patch (both before and after my changes): `s_or_saveexec_b64 s[0:1], -1` sets all bits in the exec mask.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98953/new/

https://reviews.llvm.org/D98953