[PATCH] D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer

Fri Mar 19 12:25:23 PDT 2021

b-sumner added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:299
+        B.CreateCall(UpdateDPP,
+                     {Identity, V, B.getInt32(DPP::ROW_XMASK0 | 1 << Idx),
+                      B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
----------------
foad wrote:
> foad wrote:
> > b-sumner wrote:
> > > This requires all lanes to be active.  Are we guaranteed that the work group size will be a integer multiple of the wave size?
> > The reduction or scan runs in whole wave mode. All lanes are active.
> ... and lanes that weren't active to start with are set to an appropriate identity value for the operation.
But suppose the launched grid has size 66. That means one wave has only 2 active lanes, and I'm not aware that WWM can actually activate the rest of them.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98953/new/

https://reviews.llvm.org/D98953