[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.

Sat Apr 29 00:59:24 PDT 2023

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:664-665
+  if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+    Compute = BasicBlock::Create(C, "Compute", F);
+    ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
+  }
----------------
pravinjagtap wrote:
> foad wrote:
> > Sink this down to line 700, where you use them?
> > Sink this down to line 700, where you use them?
> 
> `ComputeEnd` is required at line 766 & 770 after `if ValDivergent` loop. Thats why it is hoisted here.
I am suggesting to put these two lines immediately before the call to buildScanIteratively (line 695 now).

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:740-746
+  if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+    // Only the first active lane will enter the new control flow to update the
+    // value.
+    CallInst *const FirstActiveLane =
+        B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, Mbcnt);
+    Cond = B.CreateICmpEQ(Mbcnt, FirstActiveLane);
+  } else {
----------------
pravinjagtap wrote:
> foad wrote:
> > I don't think you need to change any of this. The original way of doing the icmp should work in all cases.
> Actually No. In the WWM, only the 0th lane (its always the case) will update the final value in a wavefront whereas in the iterative approach `first active lane` will update the final value (first active lane will not be 0th always in iterative approach).
No, even in the DPP case, the atomic is executed by the first active lane, not lane 0. This happens after exiting the WWM section.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408