[PATCH] D98953: [AMDGPU] Use reductions instead of scans in the atomic optimizer

Fri Mar 19 11:37:13 PDT 2021

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:299
+        B.CreateCall(UpdateDPP,
+                     {Identity, V, B.getInt32(DPP::ROW_XMASK0 | 1 << Idx),
+                      B.getInt32(0xf), B.getInt32(0xf), B.getFalse()}));
----------------
foad wrote:
> b-sumner wrote:
> > This requires all lanes to be active.  Are we guaranteed that the work group size will be a integer multiple of the wave size?
> The reduction or scan runs in whole wave mode. All lanes are active.
... and lanes that weren't active to start with are set to an appropriate identity value for the operation.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98953/new/

https://reviews.llvm.org/D98953