[PATCH] D147408: [AMDGPU] Iterative scan implementation for atomic optimizer.

Fri Apr 28 21:44:53 PDT 2023

pravinjagtap added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:664-665
+  if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+    Compute = BasicBlock::Create(C, "Compute", F);
+    ComputeEnd = BasicBlock::Create(C, "ComputeEnd", F);
+  }
----------------
foad wrote:
> Sink this down to line 700, where you use them?
> Sink this down to line 700, where you use them?

`ComputeEnd` is required at line 766 & 770 after `if ValDivergent` loop. Thats why it is hoisted here.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUAtomicOptimizer.cpp:740-746
+  if (ValDivergent && ScanImpl == ScanOptions::Iterative) {
+    // Only the first active lane will enter the new control flow to update the
+    // value.
+    CallInst *const FirstActiveLane =
+        B.CreateIntrinsic(Intrinsic::amdgcn_readfirstlane, {}, Mbcnt);
+    Cond = B.CreateICmpEQ(Mbcnt, FirstActiveLane);
+  } else {
----------------
foad wrote:
> I don't think you need to change any of this. The original way of doing the icmp should work in all cases.
Actually No. In the WWM, only the 0th lane (its always the case) will update the final value in a wavefront whereas in the iterative approach `first active lane` will update the final value (first active lane will not be 0th always in iterative approach).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147408/new/

https://reviews.llvm.org/D147408